SparkContext taking time after adding jars and asking yarn for resources
In my production setup spark is always taking 40 seconds between these steps like a fixed counter is set. In my local lab these steps take exact 1 second. I am not able to find the exact root cause of this behaviour. My Spark application is running on Hortonworks platform in yarn client mode. Can someone guide me what is happening between these steps 18/05/04 *07:56:45* INFO spark.SparkContext: Added JAR file:/app/jobs/jobs.jar at spark://10.233.69.5:37668/jars/jobs.jar with timestamp 1525420605369 18/05/04 *07:57:26* WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 18/05/04 *07:58:12* INFO client.AHSProxy: Connecting to Application History server at gsidev001-mgt-01.thales.fr/192.168.1.11:10200 -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Is there a way to limit the sql query result size?
Hi Eric, We are also running into the same issue. Are you able to find some suitable solution to this problem Best Regards -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-way-to-limit-the-sql-query-result-size-tp18316p23272.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark running slow for small hadoop files of 10 mb size
Thanks for the reply. It indeed increased the usage. There was another issue we found, we were broadcasting hadoop configuration by writing a wrapper class over it. But found the proper way in Spark Code sc.broadcast(new SerializableWritable(conf)) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-running-slow-for-small-hadoop-files-of-10-mb-size-tp4526p4811.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Spark running slow for small hadoop files of 10 mb size
Hi, i have been using MapReduce to analyze multiple files whose size can range from 10 mb to 200mb per file. recently i planned to move spark , but my spark Job is taking too much time executing a single file in case my file size is 10MB and hdfs block size is 64MB. It is executing on a single datanode and on single core(my cluster is a 4 node setup / each node having 32 cores). each file is having 3 million rows and i have to analyze each row(ignore none) and create a set of info from it. Isn't a way where i can parallelize the processing of the file like either on other nodes or use the remaining cores of the same node. demo code : val recordsRDD = SparkContext.sequenceFile[NullWritable,BytesWritable](FilePath,256) /*to parallelize */ infoRdd = recordsRDD.map(f => info_func()) hdfs_RDD = infoRDD.reduceByKey(_+_,48) /* makes 48 paritions*/ hdfs_RDD.saveAsNewAPIHadoopFile -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-running-slow-for-small-hadoop-files-of-10-mb-size-tp4526.html Sent from the Apache Spark User List mailing list archive at Nabble.com.