unsubscribe

2018-03-27 Thread Andrei Balici
-- Andrei Balici Student at the School of Computer Science, University of Manchester

Re: Spark 2.0 History Server Storage

2016-08-02 Thread Andrei Ivanov
1. SPARK-16859 <https://issues.apache.org/jira/browse/SPARK-16859> submitted On Tue, Aug 2, 2016 at 9:07 PM, Andrei Ivanov <aiva...@iponweb.net> wrote: > OK, answering myself - this is broken since 1.6.2 by SPARK-13845 > <https://issues.apache.org/jira/browse/SPARK-1

Re: Spark 2.0 History Server Storage

2016-08-02 Thread Andrei Ivanov
OK, answering myself - this is broken since 1.6.2 by SPARK-13845 <https://issues.apache.org/jira/browse/SPARK-13845> On Tue, Aug 2, 2016 at 12:10 AM, Andrei Ivanov <aiva...@iponweb.net> wrote: > Hi all, > > I've just tried upgrading Spark to 2.0 and so far it

Spark 2.0 History Server Storage

2016-08-01 Thread Andrei Ivanov
? Thanks, Andrei Ivanov.

Re: How does spark-submit handle Python scripts (and how to repeat it)?

2016-04-14 Thread Andrei
ontext initialization later. > > > > So generally for yarn-client, maybe you can skip spark-submit and directly > launching the spark application with some configurations setup before new > SparkContext. > > > > Not sure about your error, have you setup YARN_CONF_DIR? >

Re: How does spark-submit handle Python scripts (and how to repeat it)?

2016-04-13 Thread Andrei
; https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L47 > and > > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L65 > > > > > > *From:* Andrei [mailto:faithlessf

Re: How does spark-submit handle Python scripts (and how to repeat it)?

2016-04-12 Thread Andrei
ches an in-process JVM for SparkContext, which is a > separate JVM from the one launched by spark-submit. So you need a way, > typically an environment environment variable, like “SPARKR_SUBMIT_ARGS” > for SparkR or “PYSPARK_SUBMIT_ARGS” for pyspark, to pass command line args > to the in-

How does spark-submit handle Python scripts (and how to repeat it)?

2016-04-11 Thread Andrei
I'm working on a wrapper [1] around Spark for the Julia programming language [2] similar to PySpark. I've got it working with Spark Standalone server by creating local JVM and setting master programmatically. However, this approach doesn't work with YARN (and probably Mesos), which require running

Re: How to process one partition at a time?

2016-04-07 Thread Andrei
[Int], resultHandler: (Int, U) **⇒** >> Unit, resultFunc: >> **⇒** R): SimpleFutureAction >> <http://spark.apache.org/docs/latest/api/scala/org/apache/spark/SimpleFutureAction.html>[R]* >> >> >> >> >> >> *From:* Hemant Bhanawat [mai

Re: How to process one partition at a time?

2016-04-06 Thread Andrei
I'm writing a kind of sampler which in most cases will require only 1 partition, sometimes 2 and very rarely more. So it doesn't make sense to process all partitions in parallel. What is the easiest way to limit computations to one partition only? So far the best idea I came to is to create a

Re: DeepLearning and Spark ?

2015-01-09 Thread Andrei
Does it makes sense to use Spark's actor system (e.g. via SparkContext.env.actorSystem) to create parameter server? On Fri, Jan 9, 2015 at 10:09 PM, Peng Cheng rhw...@gmail.com wrote: You are not the first :) probably not the fifth to have the question. parameter server is not included in

Re: Spark S3 Performance

2014-11-22 Thread Andrei
Not that I'm professional user of Amazon services, but I have a guess about your performance issues. From [1], there are two different filesystems over S3: - native that behaves just like regular files (schema: s3n) - block-based that looks more like HDFS (schema: s3) Since you use s3n in your

Re: Spark S3 Performance

2014-11-22 Thread Andrei
Concerning your second question, I believe you try to set number of partitions with something like this: rdd = sc.textFile(..., 8) but things like `textFile()` don't actually take fixed number of partitions. Instead, they expect *minimal* number of partitions. Since in your file you have 21

Re: Odd error when using a rdd map within a stream map

2014-09-19 Thread Filip Andrei
Hey, i don't think that's the issue, foreach is called on 'results' which is a DStream of floats, so naturally it passes RDDs to its function. And either way, changing the code in the first mapper to comment out the map reduce process on the RDD Float f = 1.0f; //nnRdd.map(new FunctionNeuralNet,

Ensuring object in spark streaming runs on specific node

2014-08-29 Thread Filip Andrei
Say you have a spark streaming setup such as JavaReceiverInputDStream... rndLists = jssc.receiverStream(new JavaRandomReceiver(...)); rndLists.map(new NeuralNetMapper(...)) .foreach(new JavaSyncBarrier(...)); Is there any way of ensuring that, say, a JavaRandomReceiver and

Developing a spark streaming application

2014-08-27 Thread Filip Andrei
Hey guys, so the problem i'm trying to tackle is the following: - I need a data source that emits messages at a certain frequency - There are N neural nets that need to process each message individually - The outputs from all neural nets are aggregated and only when all N outputs for each message

Re: Iterator over RDD in PySpark

2014-08-02 Thread Andrei
._collect_iterator_through_file(javaIterator) On Fri, Aug 1, 2014 at 3:04 PM, Andrei faithlessfri...@gmail.com wrote: Thanks, Aaron, it should be fine with partitions (I can repartition it anyway, right?). But rdd.toLocalIterator is purely Java/Scala method. Is there Python interface to it? I can get Java iterator

Iterator over RDD in PySpark

2014-08-01 Thread Andrei
Is there a way to get iterator from RDD? Something like rdd.collect(), but returning lazy sequence and not single array. Context: I need to GZip processed data to upload it to Amazon S3. Since archive should be a single file, I want to iterate over RDD, writing each line to a local .gz file. File

Re: Iterator over RDD in PySpark

2014-08-01 Thread Andrei
individual line). Hopefully that's sufficient, though. On Fri, Aug 1, 2014 at 1:38 AM, Andrei faithlessfri...@gmail.com wrote: Is there a way to get iterator from RDD? Something like rdd.collect(), but returning lazy sequence and not single array. Context: I need to GZip processed data to upload

Re: Recommended pipeline automation tool? Oozie?

2014-07-10 Thread Andrei
I used both - Oozie and Luigi - but found them inflexible and still overcomplicated, especially in presence of Spark. Oozie has a fixed list of building blocks, which is pretty limiting. For example, you can launch Hive query, but Impala, Shark/SparkSQL, etc. are out of scope (of course, you can

Re: Purpose of spark-submit?

2014-07-09 Thread Andrei
One another +1. For me it's a question of embedding. With SparkConf/SparkContext I can easily create larger projects with Spark as a separate service (just like MySQL and JDBC, for example). With spark-submit I'm bound to Spark as a main framework that defines how my application should look like.

Re: How do you run your spark app?

2014-06-20 Thread Andrei
Hi Shivani, Adding JARs to classpath (e.g. via -cp option) is needed to run your _local_ Java application, whatever it is. To deliver them to _other machines_ for execution you need to add them to SparkContext. And you can do it in 2 different ways: 1. Add them right from your code (your

Re: Loading Python libraries into Spark

2014-06-05 Thread Andrei
). [1]: http://spark.apache.org/docs/latest/submitting-applications.html On Thu, Jun 5, 2014 at 8:10 PM, mrm ma...@skimlinks.com wrote: Hi Andrei, Thank you for your help! Just to make sure I understand, when I run this command sc.addPyFile(/path/to/yourmodule.py), I need to be already

Re: Is uberjar a recommended way of running Spark/Scala applications?

2014-06-02 Thread Andrei
.jar files in your Scala program into a directory. It doesn't merge the .jar files together, the .jar files are left as is. On Sat, May 31, 2014 at 3:42 AM, Andrei faithlessfri...@gmail.com wrote: Thanks, Stephen. I have eventually decided to go with assembly, but put away Spark

Re: Is uberjar a recommended way of running Spark/Scala applications?

2014-05-30 Thread Andrei
interactive dev setup - something that doesn't require full rebuild. [1]: https://github.com/faithlessfriend/sample-spark-project Thanks and have a good weekend, Andrei On Thu, May 29, 2014 at 8:27 PM, Stephen Boesch java...@gmail.com wrote: The MergeStrategy combined with sbt assembly did work

Is uberjar a recommended way of running Spark/Scala applications?

2014-05-29 Thread Andrei
are: 1. Is an uberjar a recommended way of running Spark applications? 2. If so, should I include Spark itself into this large jar? 3. If not, what is a recommended way to do both - development and deployment (assuming ordinary sbt project). Thanks, Andrei

Re: Is uberjar a recommended way of running Spark/Scala applications?

2014-05-29 Thread Andrei
like that for Spark/SBT? Thanks, Andrei On Thu, May 29, 2014 at 3:48 PM, jaranda jordi.ara...@bsc.es wrote: Hi Andrei, I think the preferred way to deploy Spark jobs is by using the sbt package task instead of using the sbt assembly plugin. In any case, as you comment, the mergeStrategy

Re: Computing cosine similiarity using pyspark

2014-05-23 Thread Andrei
Do you need cosine distance and correlation between vectors or between variables (elements of vector)? It would be helpful if you could tell us details of your task. On Thu, May 22, 2014 at 5:49 PM, jamal sasha jamalsha...@gmail.com wrote: Hi, I have bunch of vectors like

Proper way to create standalone app with custom Spark version

2014-05-16 Thread Andrei
Spark as local jar to every project. But both of these ways look overcomplicated and in general wrong. So what is the implied way to do it? Thanks, Andrei

Proper way to create standalone app with custom Spark version

2014-05-14 Thread Andrei
to every project. But both of these ways look overcomplicated and in general wrong. What is an implied way to solve this issue? Thanks, Andrei