WrappedArray to row of relational Db

2017-04-26 Thread vaibhavrtk
I have nested structure which i read from an xml using spark-Xml. I want to use spark sql to convert this nested structure to different relational tables (WrappedArray([WrappedArray([[null,592006340,null],null,BA,M,1724]),N,2017-04-05T16:31:03,586257528),659925562) which has a schema:

Running multiple Spark Jobs on Yarn( Client mode)

2016-07-20 Thread vaibhavrtk
I have a silly question: Do multiple spark jobs running on yarn have any impact on each other? e.g. If the traffic on one streaming job increases too much does it have any effect on second job? Will it slow it down or any other consequences? I have enough resources(memory,cores) for both jobs in

JoinWithCassandraTable over individual queries

2016-04-25 Thread vaibhavrtk
Hi I have an RDD with elements as tuple ((key1,key2),value) where (key1,key2) is the partitioning key in my Cassandra table Now for each such element I have to do a read from Cassandra table. My Cassandra table and spark cluster are in different nodes and cant be co-located. Right now I am doing

Relation between number of partitions and cores.

2016-04-01 Thread vaibhavrtk
As per Spark programming guide, it says "we should have 2-4 partitions for each CPU in your cluster.". In this case how does 1 CPU core process 2-4 partitions at the same time? Does it do context switching between tasks or run them in parallel? If it does context switching how is it efficient

Is one batch created by Streaming Context always equal to one RDD?

2015-10-19 Thread vaibhavrtk
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-one-batch-created-by-Streaming-Context-always-equal-to-one-RDD-tp25117.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

log4j Spark-worker performance problem

2015-09-28 Thread vaibhavrtk
Hello We need a lot of logging for our application about 1000 lines needed to be logged per message we process and we process 1000 msgs/sec. So total lines needed to be logged is /1000*1000/sec/. As it is going to be written in a file. Will writing so much logs will impact the processing power of