--
Andrei Balici
Student at the School of Computer Science,
University of Manchester
1. SPARK-16859 <https://issues.apache.org/jira/browse/SPARK-16859>
submitted
On Tue, Aug 2, 2016 at 9:07 PM, Andrei Ivanov <aiva...@iponweb.net> wrote:
> OK, answering myself - this is broken since 1.6.2 by SPARK-13845
> <https://issues.apache.org/jira/browse/SPARK-1
OK, answering myself - this is broken since 1.6.2 by SPARK-13845
<https://issues.apache.org/jira/browse/SPARK-13845>
On Tue, Aug 2, 2016 at 12:10 AM, Andrei Ivanov <aiva...@iponweb.net> wrote:
> Hi all,
>
> I've just tried upgrading Spark to 2.0 and so far it
?
Thanks, Andrei Ivanov.
ontext initialization later.
>
>
>
> So generally for yarn-client, maybe you can skip spark-submit and directly
> launching the spark application with some configurations setup before new
> SparkContext.
>
>
>
> Not sure about your error, have you setup YARN_CONF_DIR?
>
; https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L47
> and
>
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L65
>
>
>
>
>
> *From:* Andrei [mailto:faithlessf
ches an in-process JVM for SparkContext, which is a
> separate JVM from the one launched by spark-submit. So you need a way,
> typically an environment environment variable, like “SPARKR_SUBMIT_ARGS”
> for SparkR or “PYSPARK_SUBMIT_ARGS” for pyspark, to pass command line args
> to the in-
I'm working on a wrapper [1] around Spark for the Julia programming
language [2] similar to PySpark. I've got it working with Spark Standalone
server by creating local JVM and setting master programmatically. However,
this approach doesn't work with YARN (and probably Mesos), which require
running
[Int], resultHandler: (Int, U) **⇒**
>> Unit, resultFunc:
>> **⇒** R): SimpleFutureAction
>> <http://spark.apache.org/docs/latest/api/scala/org/apache/spark/SimpleFutureAction.html>[R]*
>>
>>
>>
>>
>>
>> *From:* Hemant Bhanawat [mai
I'm writing a kind of sampler which in most cases will require only 1
partition, sometimes 2 and very rarely more. So it doesn't make sense to
process all partitions in parallel. What is the easiest way to limit
computations to one partition only?
So far the best idea I came to is to create a
Does it makes sense to use Spark's actor system (e.g. via
SparkContext.env.actorSystem) to create parameter server?
On Fri, Jan 9, 2015 at 10:09 PM, Peng Cheng rhw...@gmail.com wrote:
You are not the first :) probably not the fifth to have the question.
parameter server is not included in
Not that I'm professional user of Amazon services, but I have a guess about
your performance issues. From [1], there are two different filesystems over
S3:
- native that behaves just like regular files (schema: s3n)
- block-based that looks more like HDFS (schema: s3)
Since you use s3n in your
Concerning your second question, I believe you try to set number of
partitions with something like this:
rdd = sc.textFile(..., 8)
but things like `textFile()` don't actually take fixed number of
partitions. Instead, they expect *minimal* number of partitions. Since in
your file you have 21
Hey, i don't think that's the issue, foreach is called on 'results' which is
a DStream of floats, so naturally it passes RDDs to its function.
And either way, changing the code in the first mapper to comment out the map
reduce process on the RDD
Float f = 1.0f; //nnRdd.map(new FunctionNeuralNet,
Say you have a spark streaming setup such as
JavaReceiverInputDStream... rndLists = jssc.receiverStream(new
JavaRandomReceiver(...));
rndLists.map(new NeuralNetMapper(...))
.foreach(new JavaSyncBarrier(...));
Is there any way of ensuring that, say, a JavaRandomReceiver and
Hey guys, so the problem i'm trying to tackle is the following:
- I need a data source that emits messages at a certain frequency
- There are N neural nets that need to process each message individually
- The outputs from all neural nets are aggregated and only when all N
outputs for each message
._collect_iterator_through_file(javaIterator)
On Fri, Aug 1, 2014 at 3:04 PM, Andrei faithlessfri...@gmail.com wrote:
Thanks, Aaron, it should be fine with partitions (I can repartition it
anyway, right?).
But rdd.toLocalIterator is purely Java/Scala method. Is there Python
interface to it?
I can get Java iterator
Is there a way to get iterator from RDD? Something like rdd.collect(), but
returning lazy sequence and not single array.
Context: I need to GZip processed data to upload it to Amazon S3. Since
archive should be a single file, I want to iterate over RDD, writing each
line to a local .gz file. File
individual line).
Hopefully that's sufficient, though.
On Fri, Aug 1, 2014 at 1:38 AM, Andrei faithlessfri...@gmail.com wrote:
Is there a way to get iterator from RDD? Something like rdd.collect(),
but returning lazy sequence and not single array.
Context: I need to GZip processed data to upload
I used both - Oozie and Luigi - but found them inflexible and still
overcomplicated, especially in presence of Spark.
Oozie has a fixed list of building blocks, which is pretty limiting. For
example, you can launch Hive query, but Impala, Shark/SparkSQL, etc. are
out of scope (of course, you can
One another +1. For me it's a question of embedding. With
SparkConf/SparkContext I can easily create larger projects with Spark as a
separate service (just like MySQL and JDBC, for example). With spark-submit
I'm bound to Spark as a main framework that defines how my application
should look like.
Hi Shivani,
Adding JARs to classpath (e.g. via -cp option) is needed to run your
_local_ Java application, whatever it is. To deliver them to _other
machines_ for execution you need to add them to SparkContext. And you can
do it in 2 different ways:
1. Add them right from your code (your
).
[1]: http://spark.apache.org/docs/latest/submitting-applications.html
On Thu, Jun 5, 2014 at 8:10 PM, mrm ma...@skimlinks.com wrote:
Hi Andrei,
Thank you for your help! Just to make sure I understand, when I run this
command sc.addPyFile(/path/to/yourmodule.py), I need to be already
.jar files in your Scala program into a
directory. It doesn't merge the .jar files together, the .jar files
are left as is.
On Sat, May 31, 2014 at 3:42 AM, Andrei faithlessfri...@gmail.com wrote:
Thanks, Stephen. I have eventually decided to go with assembly, but put
away
Spark
interactive dev setup - something that doesn't require full rebuild.
[1]: https://github.com/faithlessfriend/sample-spark-project
Thanks and have a good weekend,
Andrei
On Thu, May 29, 2014 at 8:27 PM, Stephen Boesch java...@gmail.com wrote:
The MergeStrategy combined with sbt assembly did work
are:
1. Is an uberjar a recommended way of running Spark applications?
2. If so, should I include Spark itself into this large jar?
3. If not, what is a recommended way to do both - development and
deployment (assuming ordinary sbt project).
Thanks,
Andrei
like that for Spark/SBT?
Thanks,
Andrei
On Thu, May 29, 2014 at 3:48 PM, jaranda jordi.ara...@bsc.es wrote:
Hi Andrei,
I think the preferred way to deploy Spark jobs is by using the sbt package
task instead of using the sbt assembly plugin. In any case, as you comment,
the mergeStrategy
Do you need cosine distance and correlation between vectors or between
variables (elements of vector)? It would be helpful if you could tell us
details of your task.
On Thu, May 22, 2014 at 5:49 PM, jamal sasha jamalsha...@gmail.com wrote:
Hi,
I have bunch of vectors like
Spark as local jar to every
project. But both of these ways look overcomplicated and in general wrong.
So what is the implied way to do it?
Thanks,
Andrei
to every
project. But both of these ways look overcomplicated and in general wrong.
What is an implied way to solve this issue?
Thanks,
Andrei
30 matches
Mail list logo