SparkContext taking time after adding jars and asking yarn for resources

2018-05-04 Thread neeravsalaria
In my production setup spark is always taking 40 seconds between these steps
like a fixed counter is set. In my local lab these steps take exact 1
second. I am not able to find the exact root cause of this behaviour. My
Spark application is running  on Hortonworks platform in yarn client mode.
Can someone guide me what is happening between these steps 

18/05/04 *07:56:45* INFO spark.SparkContext: Added JAR
file:/app/jobs/jobs.jar at spark://10.233.69.5:37668/jars/jobs.jar with
timestamp 1525420605369
18/05/04 *07:57:26* WARN shortcircuit.DomainSocketFactory: The short-circuit
local reads feature cannot be used because libhadoop cannot be loaded.
18/05/04 *07:58:12* INFO client.AHSProxy: Connecting to Application History
server at gsidev001-mgt-01.thales.fr/192.168.1.11:10200



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Is there a way to limit the sql query result size?

2015-06-11 Thread neeravsalaria
Hi Eric, 

   We are also running into the same issue. Are you able to find some
suitable solution to this problem

Best Regards



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-way-to-limit-the-sql-query-result-size-tp18316p23272.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark running slow for small hadoop files of 10 mb size

2014-04-24 Thread neeravsalaria
Thanks for the reply. It indeed increased the usage. There was another issue
we found, we were broadcasting hadoop configuration by writing a wrapper
class over it. But found the proper way in Spark Code 

sc.broadcast(new SerializableWritable(conf))





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-running-slow-for-small-hadoop-files-of-10-mb-size-tp4526p4811.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Spark running slow for small hadoop files of 10 mb size

2014-04-21 Thread neeravsalaria
Hi, 

  i have been using MapReduce to analyze multiple files whose size can range
from 10 mb to 200mb per file. recently i  planned to move spark , but my
spark Job is taking too much time executing a single file in case my file
size is 10MB and hdfs block size is 64MB. It is executing on a single
datanode and on single core(my cluster is a 4 node setup / each node having
32 cores). each file is having 3 million rows and i have to analyze each
row(ignore none) and create a set of info from it.

Isn't a way where i can parallelize the processing of the file like either
on other nodes or use the remaining cores of the same node. 
 


demo code : 

 val recordsRDD = 
SparkContext.sequenceFile[NullWritable,BytesWritable](FilePath,256) /*to
parallelize */

 infoRdd = recordsRDD.map(f => info_func())

 hdfs_RDD = infoRDD.reduceByKey(_+_,48)  /* makes 48 paritions*/

hdfs_RDD.saveAsNewAPIHadoopFile



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-running-slow-for-small-hadoop-files-of-10-mb-size-tp4526.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.