Fwd: Saving large textfile

2016-04-24 Thread Simon Hafner
2016-04-24 13:38 GMT+02:00 Stefan Falk : > sc.parallelize(cfile.toString() > .split("\n"), 1) Try `sc.textFile(pathToFile)` instead. >java.io.IOException: Broken pipe >at sun.nio.ch.FileDispatcherImpl.write0(Native Method) >at

Re: StreamCorruptedException during deserialization

2016-03-29 Thread Simon Hafner
2016-03-29 11:25 GMT+02:00 Robert Schmidtke : > Is there a meaningful way for me to find out what exactly is going wrong > here? Any help and hints are greatly appreciated! Maybe a version mismatch between the jars on the cluster?

Re: Output is being stored on the clusters (slaves).

2016-03-24 Thread Simon Hafner
2016-03-24 11:09 GMT+01:00 Shishir Anshuman : > I am using two Slaves to run the ALS algorithm. I am saving the predictions > in a textfile using : > saveAsTextFile(path) > > The predictions is getting stored on the slaves but I want the predictions > to be saved

Re: No active SparkContext

2016-03-24 Thread Simon Hafner
2016-03-24 9:54 GMT+01:00 Max Schmidt : > we're using with the java-api (1.6.0) a ScheduledExecutor that continuously > executes a SparkJob to a standalone cluster. I'd recommend Scala. > After each job we close the JavaSparkContext and create a new one. Why do that? You can

Re: Installing Spark on Mac

2016-03-04 Thread Simon Hafner
I'd try `brew install spark` or `apache-spark` and see where that gets you. https://github.com/Homebrew/homebrew 2016-03-04 21:18 GMT+01:00 Aida : > Hi all, > > I am a complete novice and was wondering whether anyone would be willing to > provide me with a step by step

Re: Running synchronized JRI code

2016-02-15 Thread Simon Hafner
2016-02-15 14:02 GMT+01:00 Sun, Rui : > On computation, RRDD launches one R process for each partition, so there > won't be thread-safe issue > > Could you give more details on your new environment? Running on EC2, I start the executors via /usr/bin/R CMD javareconf -e

Re: Running synchronized JRI code

2016-02-15 Thread Simon Hafner
2016-02-15 4:35 GMT+01:00 Sun, Rui : > Yes, JRI loads an R dynamic library into the executor JVM, which faces > thread-safe issue when there are multiple task threads within the executor. > > I am thinking if the demand like yours (calling R code in RDD > transformations) is

Running synchronized JRI code

2016-02-14 Thread Simon Hafner
Hello I'm currently running R code in an executor via JRI. Because R is single-threaded, any call to R needs to be wrapped in a `synchronized`. Now I can use a bit more than one core per executor, which is undesirable. Is there a way to tell spark that this specific application (or even specific

Re: Serializing DataSets

2016-01-19 Thread Simon Hafner
The occasional type error if the casting goes wrong for whatever reason. 2016-01-19 1:22 GMT+08:00 Michael Armbrust <mich...@databricks.com>: > What error? > > On Mon, Jan 18, 2016 at 9:01 AM, Simon Hafner <reactorm...@gmail.com> wrote: >> >> And for deseriali

Re: Serializing DataSets

2016-01-18 Thread Simon Hafner
toDF()). We'll likely be combining the classes > in Spark 2.0 to remove this awkwardness. > > On Tue, Jan 12, 2016 at 11:20 PM, Simon Hafner <reactorm...@gmail.com> > wrote: >> >> What's the proper way to write DataSets to disk

Serializing DataSets

2016-01-12 Thread Simon Hafner
What's the proper way to write DataSets to disk? Convert them to a DataFrame and use the writers there? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Compiling spark 1.5.1 fails with scala.reflect.internal.Types$TypeError: bad symbolic reference.

2015-12-16 Thread Simon Hafner
, how did you resolved the problem. > > On Fri, Oct 16, 2015 at 9:54 AM, Simon Hafner <reactorm...@gmail.com> wrote: >> >> Fresh clone of spark 1.5.1, java version "1.7.0_85" >> >> build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTest

Fwd: Where does mllib's .save method save a model to?

2015-11-03 Thread Simon Hafner
2015-11-03 20:26 GMT+01:00 xenocyon : > I want to save an mllib model to disk, and am trying the model.save > operation as described in > http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#examples: > > model.save(sc, "myModelPath") > > But after running

Fwd: collect() local faster than 4 node cluster

2015-11-03 Thread Simon Hafner
2015-11-03 20:07 GMT+01:00 Sebastian Kuepers : > Hey, > > with collect() RDDs elements are send as a list back to the driver. > > If have a 4 node cluster (based on Mesos) in a datacenter and I have my > local dev machine. > > I work with a small 200MB

Re: Support Ordering on UserDefinedType

2015-11-03 Thread Simon Hafner
2015-11-03 23:20 GMT+01:00 Ionized : > TypeUtils.getInterpretedOrdering currently only supports AtomicType and > StructType. Is it possible to add support for UserDefinedType as well? Yes, make a PR to spark.

Compiling spark 1.5.1 fails with scala.reflect.internal.Types$TypeError: bad symbolic reference.

2015-10-16 Thread Simon Hafner
Fresh clone of spark 1.5.1, java version "1.7.0_85" build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package [error] bad symbolic reference. A signature in WebUI.class refers to term eclipse [error] in package org which is not available. [error] It may be completely missing

udaf with multiple return values in spark 1.5.0

2015-09-06 Thread Simon Hafner
Hi everyone is it possible to return multiple values with an udaf defined in spark 1.5.0? The documentation [1] mentions abstract def dataType: DataType The DataType of the returned value of this UserDefinedAggregateFunction. so it's only possible to return a single value. Should I use

wholeTextFiles on 20 nodes

2014-11-23 Thread Simon Hafner
I have 20 nodes via EC2 and an application that reads the data via wholeTextFiles. I've tried to copy the data into hadoop via copyFromLocal, and I get 14/11/24 02:00:07 INFO hdfs.DFSClient: Exception in createBlockOutputStream 172.31.2.209:50010 java.io.IOException: Bad connect ack with

log4j logging control via sbt

2014-11-05 Thread Simon Hafner
I've tried to set the log4j logger to warn only via log4j properties file in cat src/test/resources/log4j.properties log4j.logger.org.apache.spark=WARN or in sbt via javaOptions += -Dlog4j.logger.org.apache.spark=WARN But the logger still gives me INFO messages to stdout when I run my tests

Spark with HLists

2014-10-29 Thread Simon Hafner
I tried using shapeless HLists as data storage for data inside spark. Unsurprisingly, it failed. The deserialization isn't well-defined because of all the implicits used by shapeless. How could I make it work? Sample Code: /* SimpleApp.scala */ import org.apache.spark.SparkContext import