HDFS

2015-12-11 Thread shahid ashraf
hi Folks I am using standalone cluster of 50 servers on aws. i loaded data on hdfs, why i am getting Locality Level as ANY for data on hdfs, i have 900+ partitions. -- with Regards Shahid Ashraf

Re: issue with spark.driver.maxResultSize parameter in spark 1.3

2015-11-01 Thread shahid ashraf
0 mbs . >> >> Any idea how can i try to even the data distribution acrosss multiple >> node. >> >> On Fri, Oct 30, 2015 at 12:09 AM, shahid ashraf <sha...@trialx.com> >> wrote: >> >>> Hi >>> I guess you need to increase spark driver

Re: issue with spark.driver.maxResultSize parameter in spark 1.3

2015-10-29 Thread shahid ashraf
Hi I guess you need to increase spark driver memory as well. But that should be set in conf files Let me know if that resolves On Oct 30, 2015 7:33 AM, "karthik kadiyam" wrote: > Hi, > > In spark streaming job i had the following setting > >

Re: repartition vs partitionby

2015-10-18 Thread shahid ashraf
to write a custom partitioner to help > spark distribute the data more uniformly. > > Sent from my iPhone > > On 17 Oct 2015, at 16:14, shahid ashraf <sha...@trialx.com> wrote: > > yes i know about that,its in case to reduce partitions. the point here is > the data is

Re: repartition vs partitionby

2015-10-17 Thread shahid ashraf
t; portions have large data(data skew) >> >> i have pairRDDs [({},{}),({},{}),({},{})] >> >> what is the best way to solve the the problem >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > -- with Regards Shahid Ashraf

INDEXEDRDD in PYSPARK

2015-09-03 Thread shahid ashraf
Hi Folks Any resource to get started using https://github.com/amplab/spark-indexedrdd in pyspark -- with Regards Shahid Ashraf

Re: Custom Partitioner

2015-09-02 Thread shahid ashraf
> > > > class RangePartitioner(Partitioner): > > def __init__(self, numParts): > > self.numPartitions = numParts > > self.partitionFunction = rangePartition > > def rangePartition(key): > > # Logic to turn key into a partition id > > return id >

Re: Memory-efficient successive calls to repartition()

2015-09-02 Thread shahid ashraf
.repartition(50).persist() >>> data2.count() # materialize rdd >>> data.unpersist() # unpersist previous version >>> data=data2 >>> >>> >>> Help and suggestions on this would be greatly appreciated! >>> Thanks a lot! >>> >>> >>> >>> >>> -- >>> View this message in context: >>> >>> http://apache-spark-user-list.1001560.n3.nabble.com/Memory-efficient-successive-calls-to-repartition-tp24358.html >>> Sent from the Apache Spark User List mailing list archive >>> at Nabble.com. >>> >>> >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> <mailto:user-unsubscr...@spark.apache.org> >>> <mailto:user-unsubscr...@spark.apache.org >>> <mailto:user-unsubscr...@spark.apache.org>> >>> For additional commands, e-mail: user-h...@spark.apache.org >>> <mailto:user-h...@spark.apache.org> >>> <mailto:user-h...@spark.apache.org >>> <mailto:user-h...@spark.apache.org>> >>> >>> >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> <mailto:user-unsubscr...@spark.apache.org> >>> For additional commands, e-mail: user-h...@spark.apache.org >>> <mailto:user-h...@spark.apache.org> >>> >>> >>> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > -- with Regards Shahid Ashraf

ERROR WHILE REPARTITION

2015-09-02 Thread shahid ashraf
nager.runAll(Utils.scala:2278) at org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2260) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- with Regards Shahid Ashraf

Re: Custom Partitioner

2015-09-01 Thread shahid ashraf
; > wrote: > >> Hi Sparkians >> >> How can we create a customer partition in pyspark >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> -- with Regards Shahid Ashraf

Re: Custom Partitioner

2015-09-01 Thread shahid ashraf
just need > to instantiate the Partitioner class with numPartitions and partitionFunc. > > On Tue, Sep 1, 2015 at 11:13 AM shahid ashraf <sha...@trialx.com> wrote: > >> Hi >> >> I did not get this, e.g if i need to create a custom partitioner like >> range partitione

Re: Spark ec2 lunch problem

2015-08-21 Thread shahid ashraf
launch spark-cluster but getting following message endless. Please help. Warning: SSH connection error. (This could be temporary.) Host: SSH return code: 255 SSH output: ssh: Could not resolve hostname : Name or service not known -- with Regards Shahid Ashraf

Re: How to increase parallelism of a Spark cluster?

2015-08-03 Thread shahid ashraf
? Or is there something I am doing wrong? Thank you in advance for any pointers you can provide. -sujit -- with Regards Shahid Ashraf

Re: No. of Task vs No. of Executors

2015-07-21 Thread shahid ashraf
at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Best Regards, Ayan Guha -- with Regards Shahid Ashraf

Re: Research ideas using spark

2015-07-15 Thread shahid ashraf
think this question is better suited for Stack Overflow than for a PhD thesis. On Tue, Jul 14, 2015 at 7:42 AM, shahid ashraf sha...@trialx.com wrote: hi I have a 10 node cluster i loaded the data onto hdfs, so the no. of partitions i get is 9. I am running a spark application , it gets

RE: Code review - Spark SQL command-line client for Cassandra

2015-06-20 Thread shahid ashraf
Hi Mohammad Can you provide more info about the Service u developed On Jun 20, 2015 7:59 AM, Mohammed Guller moham...@glassbeam.com wrote: Hi Matthew, It looks fine to me. I have built a similar service that allows a user to submit a query from a browser and returns the result in JSON

Re: Exception when trying to use EShadoop connector and writing rdd to ES

2015-02-10 Thread shahid ashraf
of the mailing list. Let's continue the discussion there. Cheers, On 2/10/15 6:58 PM, shahid ashraf wrote: thanks costin i m grouping data together based on id in json and rdd contains rdd = (1,{'SOURCES': [{n no. of key/valu}],}),(2,{'SOURCES': [{n no. of key/valu}],}),(3,{'SOURCES': [{n