,
Shahid
On Tue, 16 Jul 2019, 3:38 pm raman gugnani,
wrote:
> HI ,
>
> I have long running spark streaming jobs.
> Event log directories are getting filled with .inprogress files.
> Is there fix or work around for spark streaming.
>
> There is also one jira raised for the
hi Folks
I am using standalone cluster of 50 servers on aws. i loaded data on hdfs,
why i am getting Locality Level as ANY for data on hdfs, i have 900+
partitions.
--
with Regards
Shahid Ashraf
lt;karthik.kadiyam...@gmail.com>
> wrote:
>
>> Hi Shahid,
>>
>> I played around with spark driver memory too. In the conf file it was set
>> to " --driver-memory 20G " first. When i changed the spark driver
>> maxResultSize from default to 2g ,i changed th
Hi
I guess you need to increase spark driver memory as well. But that should
be set in conf files
Let me know if that resolves
On Oct 30, 2015 7:33 AM, "karthik kadiyam"
wrote:
> Hi,
>
> In spark streaming job i had the following setting
>
>
Hi
I am running 10 node standalone cluster on aws
and loading 100G data on HDFS.. doing first groupby operation.
and then generating pairs from the groupedrdd (key,[a1,b1],key,[a,b,c])
generating the pairs like
(a1,b1),(a,b),(a,c) ... n
PairRDD will get large in size.
some stats from ui when
@all i did partitionby using default hash partitioner on data
[(1,data)(2,(data),(n,data)]
the total data was approx 3.5 it showed shuffle write 50G and on next action
e.g count it is showing shuffle read of 50 G. i don't understand this
behaviour and i think the performance is getting slow with
to write a custom partitioner to help
> spark distribute the data more uniformly.
>
> Sent from my iPhone
>
> On 17 Oct 2015, at 16:14, shahid ashraf <sha...@trialx.com> wrote:
>
> yes i know about that,its in case to reduce partitions. the point here is
> the data is
Hi folks
I need to reparation large set of data around(300G) as i see some portions have
large data(data skew)
i have pairRDDs [({},{}),({},{}),({},{})]
what is the best way to solve the the problem
-
To unsubscribe, e-mail:
ns. This one minimizes the data shuffle.
>
> -Raghav
>
> On Sat, Oct 17, 2015 at 1:02 PM, shahid qadri <shahidashr...@icloud.com>
> wrote:
>
>> Hi folks
>>
>> I need to reparation large set of data around(300G) as i see some
>&g
hi
I tried to build latest master branch of spark
build/mvn -DskipTests clean package
Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS [03:46 min]
[INFO] Spark Project Test Tags SUCCESS [01:02 min]
[INFO] Spark Project
Hi Folks
How i can submit my spark app(python) to the cluster without using
spark-submit, actually i need to invoke jobs from UI
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail:
p
> distros might, for example EMR in AWS has a job submit UI.
>
> Spark submit just calls a REST api, you could build any UI you want on top of
> that...
>
>
> On Tue, Oct 6, 2015 at 9:37 AM, shahid qadri <shahidashr...@icloud.com
> <mailto:shahidashr...@icloud.com>
Hi Folks
Any resource to get started using https://github.com/amplab/spark-indexedrdd
in pyspark
--
with Regards
Shahid Ashraf
> >
> > class RangePartitioner(Partitioner):
> > def __init__(self, numParts):
> > self.numPartitions = numParts
> > self.partitionFunction = rangePartition
> > def rangePartition(key):
> > # Logic to turn key into a partition id
> > return id
>
led
15/09/02 21:12:43 INFO DAGScheduler: ShuffleMapStage 10 (repartition at
NativeMethodAccessorImpl.java:-2) failed in 102.132 s
15/09/02 21:12:43 INFO DAGScheduler: Job 4 failed: collect at
/Users/shahid/projects/spark_rl/record_linker_spark.py:74, took 102.154710 s
Traceback (most recent call last):
Fi
leMapStage 10 (repartition at
NativeMethodAccessorImpl.java:-2) failed in 102.132 s
15/09/02 21:12:43 INFO DAGScheduler: Job 4 failed: collect at
/Users/shahid/projects/spark_rl/record_linker_spark.py:74, took 102.154710 s
Traceback (most recent call last):
File "/Users/shahid/projects/spark_rl/record_linker_sp
Hi Sparkians
How can we create a customer partition in pyspark
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
> On Aug 25, 2015, at 10:43 PM, shahid qadri <shahidashr...@icloud.com> wrote:
>
> Any resources on this
>
>> On Aug 25, 2015, at 3:15 PM, shahid qadri <shahidashr...@icloud.com> wrote:
>>
>> I would like to implement sorted neighborhood approach i
ee below
>
> class MyPartitioner extends partitioner {
> def numPartitions: Int = // Return the number of partitions
> def getPartition(key Any): Int = // Return the partition for a given key
> }
>
> On Tue, Sep 1, 2015 at 10:15 AM shahid qadri <shahidashr...@icloud.com>
just need
> to instantiate the Partitioner class with numPartitions and partitionFunc.
>
> On Tue, Sep 1, 2015 at 11:13 AM shahid ashraf <sha...@trialx.com> wrote:
>
>> Hi
>>
>> I did not get this, e.g if i need to create a custom partitioner like
>> range partitione
Any resources on this
On Aug 25, 2015, at 3:15 PM, shahid qadri shahidashr...@icloud.com wrote:
I would like to implement sorted neighborhood approach in spark, what is the
best way to write that in pyspark
I would like to implement sorted neighborhood approach in spark, what is the
best way to write that in pyspark.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail:
launch spark-cluster but
getting following message endless. Please help.
Warning: SSH connection error. (This could be temporary.)
Host:
SSH return code: 255
SSH output: ssh: Could not resolve hostname : Name or service not known
--
with Regards
Shahid Ashraf
? Or is there something I am doing wrong?
Thank you in advance for any pointers you can provide.
-sujit
--
with Regards
Shahid Ashraf
, then its a node issue, else, ost ikely data
issue
On Tue, Jul 14, 2015 at 11:43 PM, shahid sha...@trialx.com wrote:
hi
I have a 10 node cluster i loaded the data onto hdfs, so the no. of
partitions i get is 9. I am running a spark application , it gets stuck on
one of tasks, looking
- there's a lot of work being done on this.
Best, Will
On 15 July 2015 at 09:01, Vineel Yalamarthy vineelyalamar...@gmail.com
wrote:
Hi Daniel
Well said
Regards
Vineel
On Tue, Jul 14, 2015, 6:11 AM Daniel Darabos
daniel.dara...@lynxanalytics.com wrote:
Hi Shahid,
To be honest I
hi
I have a 10 node cluster i loaded the data onto hdfs, so the no. of
partitions i get is 9. I am running a spark application , it gets stuck on
one of tasks, looking at the UI it seems application is not using all nodes
to do calculations. attached is the screen shot of tasks, it seems tasks
Hi Mohammad
Can you provide more info about the Service u developed
On Jun 20, 2015 7:59 AM, Mohammed Guller moham...@glassbeam.com wrote:
Hi Matthew,
It looks fine to me. I have built a similar service that allows a user to
submit a query from a browser and returns the result in JSON
conf =
SparkConf().setAppName(spark_calc3merged).setMaster(spark://ec2-54-145-68-13.compute-1.amazonaws.com:7077)
sc =
SparkContext(conf=conf,pyFiles=[/root/platinum.py,/root/collections2.py])
15/02/28 19:06:38 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 3.0
(TID 38,
Also the data file is on hdfs
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/getting-this-error-while-runing-tp21860p21861.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
hi costin i upgraded the es hadoop connector , and at this point i can't
use scala, but still getting same error
On Tue, Feb 10, 2015 at 10:34 PM, Costin Leau costin.l...@gmail.com wrote:
Hi shahid,
I've sent the reply to the group - for some reason I replied to your
address instead
INFO scheduler.TaskSetManager: Starting task 2.1 in stage 2.0 (TID 9,
ip-10-80-98-118.ec2.internal, PROCESS_LOCAL, 1025 bytes)
15/02/10 15:54:08 INFO scheduler.TaskSetManager: Lost task 1.0 in stage 2.0
(TID 6) on executor ip-10-80-15-145.ec2.internal:
org.apache.spark.SparkException (Data of type
the log is here py4j.protocol.Py4JError: An error occurred while calling
o22.__getnewargs__. Trace:
py4j.Py4JException: Method __getnewargs__([]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
at
hi guys
i have just starting using spark, i am getting this as an info
15/01/02 11:54:17 INFO DAGScheduler: Parents of final stage: List()
15/01/02 11:54:17 INFO DAGScheduler: Missing parents: List()
15/01/02 11:54:17 INFO DAGScheduler: Submitting Stage 6 (PythonRDD[12] at
RDD at
34 matches
Mail list logo