Re: Spark Performance on Yarn

2015-04-21 Thread hnahak
Try --executor-memory 5g , because you have 8 gb RAM in each machine -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p22603.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: newAPIHadoopRDD file name

2015-04-19 Thread hnahak
In record reader level you can pass the file name as key or value. sc.newAPIHadoopRDD(job.getConfiguration, classOf[AvroKeyInputFormat[myObject]], classOf[AvroKey[myObject]], classOf[Text] // can contain your file) AvroKeyInputFormat extends

Re: GraphX: unbalanced computation and slow runtime on livejournal network

2015-04-19 Thread hnahak
Hi Steve i did spark 1.3.0 page rank bench-marking on soc-LiveJournal1 in 4 node cluster. 16,16,8,8 Gbs ram respectively. Cluster have 4 worker including master with 4,4,2,2 CPUs I set executor memroy to 3g and driver to 5g. No. of Iterations -- GraphX(mins) 1 -- 1 2

Re: GraphX:java.lang.NoSuchMethodError:org.apache.spark.graphx.Graph$.apply

2015-02-26 Thread hnahak
I can able to run it without any issue from standalone as well as in cluster. spark-submit --class org.graphx.test.GraphFromVerteXEdgeArray --executor-memory 1g --driver-memory 6g --master spark://VM-Master:7077 spark-graphx.jar code is exact same as above -- View this message in context:

Re: Posting to the list

2015-02-22 Thread hnahak
I'm also facing the same issue, this is third time whenever I post anything it never accept by the community and at the same time got a failure mail in my register mail id. and when click to subscribe to this mailing list link, i didnt get any new subscription mail in my inbox. Please anyone

How to send user variables from Spark client to custom InputFormat or RecordReader ?

2015-02-22 Thread hnahak
Hi, I have written custom InputFormat and RecordReader for Spark, I need to use user variables from spark client program. I added them in SparkConf val sparkConf = new SparkConf().setAppName(args(0)).set(developer,MyName) *and in InputFormat class* protected boolean

Re: How to send user variables from Spark client to custom InputFormat or RecordReader ?

2015-02-22 Thread hnahak
Instead of setting in SparkConf , set it into SparkContext.hadoopconfiguration.set(key,value) and from JobContext extract same key. --Harihar -- View this message in context:

Re: connector for CouchDB

2015-02-03 Thread hnahak
Spark Doesn't support it, but this connector is open source, you can get it from github. The difference between these two DB is depending on what type of solution you are looking for. Please refer this link : http://blog.nahurst.com/visual-guide-to-nosql-systems FYI, from the list of NOSQL in

Re: We are migrating Tera Data SQL to Spark SQL. Query is taking long time. Please have a look on this issue

2015-01-29 Thread hnahak
do set executor memory as well. You have RAM in each node and storage. set it o 6 GB or more , if require change driver memory from 10 gb to more. --Harihar -- View this message in context:

Re: Data Locality

2015-01-28 Thread hnahak
I have wrote a custom input split and I want to set to the specific node, where my data is stored. but currently split can start at any node and pick data from different node in the cluster. any suggestion, how to set host in spark -- View this message in context:

Re: data locality in logs

2015-01-28 Thread hnahak
Hi How to set a preferred location for an InputSplit in spark standalone? I have data in specific machine and I want to read them using Splits which is created for that node only, by assigning some property which help Spark to create a split in that node only. -- View this message in