hi Folks
I am using standalone cluster of 50 servers on aws. i loaded data on hdfs,
why i am getting Locality Level as ANY for data on hdfs, i have 900+
partitions.
--
with Regards
Shahid Ashraf
0 mbs .
>>
>> Any idea how can i try to even the data distribution acrosss multiple
>> node.
>>
>> On Fri, Oct 30, 2015 at 12:09 AM, shahid ashraf <sha...@trialx.com>
>> wrote:
>>
>>> Hi
>>> I guess you need to increase spark driver
Hi
I guess you need to increase spark driver memory as well. But that should
be set in conf files
Let me know if that resolves
On Oct 30, 2015 7:33 AM, "karthik kadiyam"
wrote:
> Hi,
>
> In spark streaming job i had the following setting
>
>
to write a custom partitioner to help
> spark distribute the data more uniformly.
>
> Sent from my iPhone
>
> On 17 Oct 2015, at 16:14, shahid ashraf <sha...@trialx.com> wrote:
>
> yes i know about that,its in case to reduce partitions. the point here is
> the data is
t; portions have large data(data skew)
>>
>> i have pairRDDs [({},{}),({},{}),({},{})]
>>
>> what is the best way to solve the the problem
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
--
with Regards
Shahid Ashraf
Hi Folks
Any resource to get started using https://github.com/amplab/spark-indexedrdd
in pyspark
--
with Regards
Shahid Ashraf
> >
> > class RangePartitioner(Partitioner):
> > def __init__(self, numParts):
> > self.numPartitions = numParts
> > self.partitionFunction = rangePartition
> > def rangePartition(key):
> > # Logic to turn key into a partition id
> > return id
>
.repartition(50).persist()
>>> data2.count() # materialize rdd
>>> data.unpersist() # unpersist previous version
>>> data=data2
>>>
>>>
>>> Help and suggestions on this would be greatly appreciated!
>>> Thanks a lot!
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Memory-efficient-successive-calls-to-repartition-tp24358.html
>>> Sent from the Apache Spark User List mailing list archive
>>> at Nabble.com.
>>>
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> <mailto:user-unsubscr...@spark.apache.org>
>>> <mailto:user-unsubscr...@spark.apache.org
>>> <mailto:user-unsubscr...@spark.apache.org>>
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>> <mailto:user-h...@spark.apache.org>
>>> <mailto:user-h...@spark.apache.org
>>> <mailto:user-h...@spark.apache.org>>
>>>
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> <mailto:user-unsubscr...@spark.apache.org>
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>> <mailto:user-h...@spark.apache.org>
>>>
>>>
>>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
--
with Regards
Shahid Ashraf
nager.runAll(Utils.scala:2278)
at
org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2260)
at
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
--
with Regards
Shahid Ashraf
;
> wrote:
>
>> Hi Sparkians
>>
>> How can we create a customer partition in pyspark
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
--
with Regards
Shahid Ashraf
just need
> to instantiate the Partitioner class with numPartitions and partitionFunc.
>
> On Tue, Sep 1, 2015 at 11:13 AM shahid ashraf <sha...@trialx.com> wrote:
>
>> Hi
>>
>> I did not get this, e.g if i need to create a custom partitioner like
>> range partitione
launch spark-cluster but
getting following message endless. Please help.
Warning: SSH connection error. (This could be temporary.)
Host:
SSH return code: 255
SSH output: ssh: Could not resolve hostname : Name or service not known
--
with Regards
Shahid Ashraf
? Or is there something I am doing wrong?
Thank you in advance for any pointers you can provide.
-sujit
--
with Regards
Shahid Ashraf
at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Best Regards,
Ayan Guha
--
with Regards
Shahid Ashraf
think this question is better suited for Stack Overflow
than for a PhD thesis.
On Tue, Jul 14, 2015 at 7:42 AM, shahid ashraf sha...@trialx.com
wrote:
hi
I have a 10 node cluster i loaded the data onto hdfs, so the no. of
partitions i get is 9. I am running a spark application , it gets
Hi Mohammad
Can you provide more info about the Service u developed
On Jun 20, 2015 7:59 AM, Mohammed Guller moham...@glassbeam.com wrote:
Hi Matthew,
It looks fine to me. I have built a similar service that allows a user to
submit a query from a browser and returns the result in JSON
of the mailing list.
Let's continue the discussion there.
Cheers,
On 2/10/15 6:58 PM, shahid ashraf wrote:
thanks costin
i m grouping data together based on id in json and rdd contains
rdd = (1,{'SOURCES': [{n no. of key/valu}],}),(2,{'SOURCES': [{n no. of
key/valu}],}),(3,{'SOURCES': [{n
17 matches
Mail list logo