PySpark Structured Streaming - using previous iteration computed results in current iteration

2018-05-16 Thread Ofer Eliassaf
hour. We want to keep around the labels and the sample ids for the next iteration (N+1) where we want to do a join with the new sample window to inherit the labels of samples that existed in the previous (N) iteration. -- Regards, Ofer Eliassaf

Re: pyspark cluster mode on standalone deployment

2017-03-05 Thread Ofer Eliassaf
anyone? please? is this getting any priority? On Tue, Sep 27, 2016 at 3:38 PM, Ofer Eliassaf <ofer.elias...@gmail.com> wrote: > Is there any plan to support python spark running in "cluster mode" on a > standalone deployment? > > There is this famous survey

Re: PySpark TaskContext

2016-11-24 Thread Ofer Eliassaf
t; > wrote: > >> Cool - thanks. I'll circle back with the JIRA number once I've got it >> created - will probably take awhile before it lands in a Spark release >> (since 2.1 has already branched) but better debugging information for >> Python users is certainly important/

Re: PySpark TaskContext

2016-11-24 Thread Ofer Eliassaf
> > On Thu, Nov 24, 2016 at 1:39 AM, ofer <ofer.elias...@gmail.com> wrote: > >> Hi, >> Is there a way to get in PYSPARK something like TaskContext from a code >> running on executor like in scala spark? >> >> If not - how can i know my task id from inside

PySpark TaskContext

2016-11-24 Thread ofer
Hi, Is there a way to get in PYSPARK something like TaskContext from a code running on executor like in scala spark? If not - how can i know my task id from inside the executors? Thanks! -- View this message in context:

Dynamic Resource Allocation in a standalone

2016-10-27 Thread Ofer Eliassaf
applications will get the total amount of cores until a new application arrives... -- Regards, Ofer Eliassaf

Re: spark standalone with multiple workers gives a warning

2016-10-06 Thread Ofer Eliassaf
> > So how would I start a cluster of 3? SPARK_WORKER_INSTANCES is the only > way I see to start the standalone cluster and the only way I see to define > it is in spark-env.sh. The spark submit option, SPARK_EXECUTOR_INSTANCES > and spark.executor.instances are all related to submitting the job. > > > > Any ideas? > > Thanks > > Assaf > -- Regards, Ofer Eliassaf

Re: Pyspark not working on yarn-cluster mode

2016-09-27 Thread ofer
I advice you to use livy for this purpose. Livy works well with yarn and it will decouple spark from your web app. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-not-working-on-yarn-cluster-mode-tp23755p27799.html Sent from the Apache Spark User

pyspark cluster mode on standalone deployment

2016-09-27 Thread Ofer Eliassaf
vailabilty in python spark. Cuurently only Yarn deployment supports it. Bringing the huge Yarn installation just for this feature is not fun at all Does someone have time estimation for this? -- Regards, Ofer Eliassaf

Re: Strange behavior with PySpark when using Join() and zip()

2015-03-23 Thread Ofer Mendelevitch
to this same issue. Maybe worth mentioning in the docs then? Ofer On Mar 23, 2015, at 11:40 AM, Sean Owen so...@cloudera.com wrote: I think the explanation is that the join does not guarantee any order, since it causes a shuffle in general, and it is computed twice in the first example