I have nested structure which i read from an xml using spark-Xml. I want to
use spark sql to convert this nested structure to different relational
tables
(WrappedArray([WrappedArray([[null,592006340,null],null,BA,M,1724]),N,2017-04-05T16:31:03,586257528),659925562)
which has a schema:
I have a silly question:
Do multiple spark jobs running on yarn have any impact on each other?
e.g. If the traffic on one streaming job increases too much does it have any
effect on second job? Will it slow it down or any other consequences?
I have enough resources(memory,cores) for both jobs in
Hi
I have an RDD with elements as tuple ((key1,key2),value) where (key1,key2)
is the partitioning key in my Cassandra table
Now for each such element I have to do a read from Cassandra table. My
Cassandra table and spark cluster are in different nodes and cant be
co-located.
Right now I am doing
As per Spark programming guide, it says "we should have 2-4 partitions for
each CPU in your cluster.". In this case how does 1 CPU core process 2-4
partitions at the same time?
Does it do context switching between tasks or run them in parallel? If it
does context switching how is it efficient
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-one-batch-created-by-Streaming-Context-always-equal-to-one-RDD-tp25117.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hello
We need a lot of logging for our application about 1000 lines needed to be
logged per message we process and we process 1000 msgs/sec. So total lines
needed to be logged is /1000*1000/sec/. As it is going to be written in a
file. Will writing so much logs will impact the processing power of