from:"Simon Hafner"

Re: ZuriHac 2019 - GHC Track

2019-06-10 Thread Simon Hafner

I have a ~ 30 minute talk which covers my GHC proposal
(NoToplevelFieldSelectors), as well as parts of the renamer. I could hold
that at any point, if there's still time slots over.

Am Di., 28. Mai 2019 um 23:37 Uhr schrieb Ben Gamari :

> Andreas Herrmann  writes:
>
> > Dear GHC devs,
> >
> I've been rather quiet on this since it's been unclear whether I will
> be able to make it to ZuriHac this year. While I would love to be there
> (and perhaps do some hiking after), at this point chances are
> unfortunately looking rather slim; it looks like I may have contracted
> Lyme disease so international traveling likely isn't a good idea in the
> next couple of months.
>
> I can, however, try to be around on IRC as festivities are underway.
>
> Cheers,
>
> - Ben
>
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Fwd: Saving large textfile

2016-04-24 Thread Simon Hafner

2016-04-24 13:38 GMT+02:00 Stefan Falk :
> sc.parallelize(cfile.toString()
>   .split("\n"), 1)
Try `sc.textFile(pathToFile)` instead.

>java.io.IOException: Broken pipe
>at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>...

That sounds like something's crashing. Maybe OOM? Don't use spark to
aggregate into a single text file, do that with something else (e.g.
cat) later.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: StreamCorruptedException during deserialization

2016-03-29 Thread Simon Hafner

2016-03-29 11:25 GMT+02:00 Robert Schmidtke :
> Is there a meaningful way for me to find out what exactly is going wrong
> here? Any help and hints are greatly appreciated!
Maybe a version mismatch between the jars on the cluster?

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Output is being stored on the clusters (slaves).

2016-03-24 Thread Simon Hafner

2016-03-24 11:09 GMT+01:00 Shishir Anshuman :
> I am using two Slaves to run the ALS algorithm. I am saving the predictions
> in a textfile using :
>   saveAsTextFile(path)
>
> The predictions is getting stored on the slaves but I want the predictions
> to be saved on the Master.
Yes, that is expected behavior. `path` is resolved on the machine it
is executed, the slaves. I'd recommend to either use a cluster FS
(e.g. HDFS) or .collect() your data so you can save them locally on
the master. Beware of OOM if your data is big.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: No active SparkContext

2016-03-24 Thread Simon Hafner

2016-03-24 9:54 GMT+01:00 Max Schmidt :
> we're using with the java-api (1.6.0) a ScheduledExecutor that
continuously
> executes a SparkJob to a standalone cluster.
I'd recommend Scala.

> After each job we close the JavaSparkContext and create a new one.
Why do that? You can happily reuse it. Pretty sure that also causes
the other problems, because you have a race condition on waiting for
the job to finish and stopping the Context.

Re: Installing Spark on Mac

2016-03-04 Thread Simon Hafner

I'd try `brew install spark` or `apache-spark` and see where that gets
you. https://github.com/Homebrew/homebrew

2016-03-04 21:18 GMT+01:00 Aida :
> Hi all,
>
> I am a complete novice and was wondering whether anyone would be willing to
> provide me with a step by step guide on how to install Spark on a Mac; on
> standalone mode btw.
>
> I downloaded a prebuilt version, the second version from the top. However, I
> have not installed Hadoop and am not planning to at this stage.
>
> I also downloaded Scala from the Scala website, do I need to download
> anything else?
>
> I am very eager to learn more about Spark but am unsure about the best way
> to do it.
>
> I would be happy for any suggestions or ideas
>
> Many thanks,
>
> Aida
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Installing-Spark-on-Mac-tp26397.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Running synchronized JRI code

2016-02-15 Thread Simon Hafner

2016-02-15 14:02 GMT+01:00 Sun, Rui :
> On computation, RRDD launches one R process for each partition, so there 
> won't be thread-safe issue
>
> Could you give more details on your new environment?

Running on EC2, I start the executors via

 /usr/bin/R CMD javareconf -e "/usr/lib/spark/sbin/start-master.sh"

I invoke R via roughly

object R {
  case class Element(value: Double)
  lazy val re = Option(REngine.getLastEngine()).getOrElse({
val eng = new JRI.JRIEngine()

eng.parseAndEval(scala.io.Source.fromInputStream(this.getClass().getClassLoader().getResourceAsStream("r/fit.R")).mkString)
eng
  })

  def fit(curve: Seq[Element]): Option[Fitting] = {
synchronized {
  val env = re.newEnvironment(null, false)
  re.assign("curve", new REXPDouble(curve.map(_.value).toArray), env)
  val df = re.parseAndEval("data.frame(curve=curve)", env, true)
  re.assign("df", df, env)
  val fitted = re.parseAndEval("fit(df)", env, true).asList
  if (fitted.keys == null) {
None
  } else {
val map = fitted.keys.map(key => (key,
fitted.at(key).asDouble)).toMap
Some(Fitting(map("values")))
  }
}
  }
}

where `fit` is wrapped in an UDAF.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Running synchronized JRI code

2016-02-15 Thread Simon Hafner

2016-02-15 4:35 GMT+01:00 Sun, Rui :
> Yes, JRI loads an R dynamic library into the executor JVM, which faces 
> thread-safe issue when there are multiple task threads within the executor.
>
> I am thinking if the demand like yours (calling R code in RDD 
> transformations) is much desired, we may consider refactoring RRDD for this 
> purpose, although it is currently intended for internal use by SparkR and not 
> a public API.
So the RRDDs don't have that thread safety issue? I'm currently
creating a new environment for each call, but it still crashes.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Running synchronized JRI code

2016-02-14 Thread Simon Hafner

Hello

I'm currently running R code in an executor via JRI. Because R is
single-threaded, any call to R needs to be wrapped in a
`synchronized`. Now I can use a bit more than one core per executor,
which is undesirable. Is there a way to tell spark that this specific
application (or even specific UDF) needs multiple JVMs? Or should I
switch from JRI to a pipe-based (slower) setup?

Cheers,
Simon

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Serializing DataSets

2016-01-19 Thread Simon Hafner

The occasional type error if the casting goes wrong for whatever reason.

2016-01-19 1:22 GMT+08:00 Michael Armbrust <mich...@databricks.com>:
> What error?
>
> On Mon, Jan 18, 2016 at 9:01 AM, Simon Hafner <reactorm...@gmail.com> wrote:
>>
>> And for deserializing,
>> `sqlContext.read.parquet("path/to/parquet").as[T]` and catch the
>> error?
>>
>> 2016-01-14 3:43 GMT+08:00 Michael Armbrust <mich...@databricks.com>:
>> > Yeah, thats the best way for now (note the conversion is purely logical
>> > so
>> > there is no cost of calling toDF()).  We'll likely be combining the
>> > classes
>> > in Spark 2.0 to remove this awkwardness.
>> >
>> > On Tue, Jan 12, 2016 at 11:20 PM, Simon Hafner <reactorm...@gmail.com>
>> > wrote:
>> >>
>> >> What's the proper way to write DataSets to disk? Convert them to a
>> >> DataFrame and use the writers there?
>> >>
>> >> -
>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: user-h...@spark.apache.org
>> >>
>> >
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Serializing DataSets

2016-01-18 Thread Simon Hafner

And for deserializing,
`sqlContext.read.parquet("path/to/parquet").as[T]` and catch the
error?

2016-01-14 3:43 GMT+08:00 Michael Armbrust <mich...@databricks.com>:
> Yeah, thats the best way for now (note the conversion is purely logical so
> there is no cost of calling toDF()).  We'll likely be combining the classes
> in Spark 2.0 to remove this awkwardness.
>
> On Tue, Jan 12, 2016 at 11:20 PM, Simon Hafner <reactorm...@gmail.com>
> wrote:
>>
>> What's the proper way to write DataSets to disk? Convert them to a
>> DataFrame and use the writers there?
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

[jira] [Commented] (SPARK-12677) Lazy file discovery for parquet

2016-01-13 Thread Simon Hafner (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096594#comment-15096594
 ] 

Simon Hafner commented on SPARK-12677:
--

What would be the gain? The application would crash with a different error, but 
it would still crash.

> Lazy file discovery for parquet
> ---
>
> Key: SPARK-12677
> URL: https://issues.apache.org/jira/browse/SPARK-12677
> Project: Spark
>  Issue Type: Wish
>  Components: SQL
>Reporter: Tiago Albineli Motta
>Priority: Minor
>  Labels: features
>
> When using sqlContext.read.parquet(files: _*) the driver verifyies if 
> everything is ok with the files. But reading those files is lazy, so when it 
> starts maybe the files are not there anymore, or they have changed, so we 
> receive this error message:
> {quote}
> 16/01/06 10:52:43 ERROR yarn.ApplicationMaster: User class threw exception: 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 4.3 in stage 0.0 
> (TID 16, riolb586.globoi.com): java.io.FileNotFoundException: File does not 
> exist: 
> hdfs://mynamenode.com:8020/rec/prefs/2016/01/06/part-r-3-27a100b0-ff49-45ad-8803-e6cc77286661.gz.parquet
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
>   at 
> parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:381)
>   at 
> parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:155)
>   at 
> parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138)
>   at 
> org.apache.spark.sql.sources.SqlNewHadoopRDD$$anon$1.(SqlNewHadoopRDD.scala:153)
>   at 
> org.apache.spark.sql.sources.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:124)
>   at 
> org.apache.spark.sql.sources.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:66)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
>   at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:70)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {quote}
> Maybe if  sqlContext.read.parquet could receive a Function to discover the 
> files instead it could be avoided. Like this: sqlContext.read.parquet( () => 
> files )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Serializing DataSets

2016-01-12 Thread Simon Hafner

What's the proper way to write DataSets to disk? Convert them to a
DataFrame and use the writers there?

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Compiling spark 1.5.1 fails with scala.reflect.internal.Types$TypeError: bad symbolic reference.

2015-12-16 Thread Simon Hafner

It happens with 2.11, you'll have to do both:

./dev/change-scala-version.sh 2.11
mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package

you get that error if you forget one IIRC.

2015-12-05 20:17 GMT+08:00 MrAsanjar . <afsan...@gmail.com>:
> Simon, I am getting the same error, how did you resolved the problem.
>
> On Fri, Oct 16, 2015 at 9:54 AM, Simon Hafner <reactorm...@gmail.com> wrote:
>>
>> Fresh clone of spark 1.5.1, java version "1.7.0_85"
>>
>> build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean
>> package
>>
>> [error] bad symbolic reference. A signature in WebUI.class refers to
>> term eclipse
>> [error] in package org which is not available.
>> [error] It may be completely missing from the current classpath, or
>> the version on
>> [error] the classpath might be incompatible with the version used when
>> compiling WebUI.class.
>> [error] bad symbolic reference. A signature in WebUI.class refers to term
>> jetty
>> [error] in value org.eclipse which is not available.
>> [error] It may be completely missing from the current classpath, or
>> the version on
>> [error] the classpath might be incompatible with the version used when
>> compiling WebUI.class.
>> [error]
>> [error]  while compiling:
>> /root/spark/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
>> [error] during phase: erasure
>> [error]  library version: version 2.10.4
>> [error] compiler version: version 2.10.4
>> [error]   reconstructed args: -deprecation -classpath
>>
>> /root/spark/sql/core/target/scala-2.10/classes:/root/.m2/repository/org/apache/spark/spark-core_2.10/1.6.0-SNAPSHOT/spark-core_2.10-1.6.0-SNAPSHOT.jar:/root/.m
>>
>> 2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/root/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/root/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.
>>
>> 7-tests.jar:/root/.m2/repository/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar:/root/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/root/.m2/repository/com/esotericsoftware/reflectasm/reflec
>>
>> tasm/1.07/reflectasm-1.07-shaded.jar:/root/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/root/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/root/.m2/repository/org/apach
>>
>> e/hadoop/hadoop-client/2.4.0/hadoop-client-2.4.0.jar:/root/.m2/repository/org/apache/hadoop/hadoop-common/2.4.0/hadoop-common-2.4.0.jar:/root/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/root/.m
>>
>> 2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/root/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/root/.m2/repository/commons-collections/commons-collections/3.2.1/commons-
>>
>> collections-3.2.1.jar:/root/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/root/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/root/.m
>>
>> 2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/root/.m2/repository/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/root/.m2/repository/org/apac
>>
>> he/hadoop/hadoop-auth/2.4.0/hadoop-auth-2.4.0.jar:/root/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.4.0/hadoop-hdfs-2.4.0.jar:/root/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/root
>>
>> /.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-app/2.4.0/hadoop-mapreduce-client-app-2.4.0.jar:/root/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-common/2.4.0/hadoop-mapreduce-client-common-
>>
>> 2.4.0.jar:/root/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.4.0/hadoop-yarn-client-2.4.0.jar:/root/.m2/repository/com/sun/jersey/jersey-client/1.9/jersey-client-1.9.jar:/root/.m2/repository/org/apache/ha
>>
>> doop/hadoop-yarn-server-common/2.4.0/hadoop-yarn-server-common-2.4.0.jar:/root/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.4.0/hadoop-mapreduce-client-shuffle-2.4.0.jar:/root/.m2/repository/
>>
>> org/apache/hadoop/hadoop-yarn-api/2.4.0/hadoop-yarn-api-2.4.0.jar:/root/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-core/2.4.0/hadoop-mapreduce-client-core-2.4.0.jar:/root/.m2/repository/org/apache/ha
>>
>> doop/hadoop-yarn-common/2.4.0/hadoop-yarn-common-2.4.0.jar:/root/.m2/repository/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/root/.m2/repository/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar:/root/.m2/re
>>
>> pository/org/apache/hadoop/hadoop-mapreduce-client-jobclient

[jira] [Commented] (SPARK-11539) Debian packaging

2015-11-05 Thread Simon Hafner (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992587#comment-14992587
 ] 

Simon Hafner commented on SPARK-11539:
--

sbt-native-packager makes it slightly easier.

> Debian packaging
> 
>
> Key: SPARK-11539
> URL: https://issues.apache.org/jira/browse/SPARK-11539
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>    Reporter: Simon Hafner
>
> This patch adds debian packaging with systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-11539) Debian packaging

2015-11-05 Thread Simon Hafner (JIRA)

Simon Hafner created SPARK-11539:


 Summary: Debian packaging
 Key: SPARK-11539
 URL: https://issues.apache.org/jira/browse/SPARK-11539
 Project: Spark
  Issue Type: New Feature
  Components: Build
Reporter: Simon Hafner


This patch adds debian packaging with systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-11539) Debian packaging

2015-11-05 Thread Simon Hafner (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Hafner updated SPARK-11539:
-
Comment: was deleted

(was: sbt-native-packager makes it slightly easier.)

> Debian packaging
> 
>
> Key: SPARK-11539
> URL: https://issues.apache.org/jira/browse/SPARK-11539
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>    Reporter: Simon Hafner
>
> This patch adds debian packaging with systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Fwd: Where does mllib's .save method save a model to?

2015-11-03 Thread Simon Hafner

2015-11-03 20:26 GMT+01:00 xenocyon :
> I want to save an mllib model to disk, and am trying the model.save
> operation as described in
> http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#examples:
>
> model.save(sc, "myModelPath")
>
> But after running it, I am unable to find any newly created file or
> dir by the name "myModelPath" in any obvious places. Any ideas where
> it might lie?
In the hdfs configured by your spark instance. If you want to store it
in your local file system, use

file:///path/to/model

instead.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Fwd: collect() local faster than 4 node cluster

2015-11-03 Thread Simon Hafner

2015-11-03 20:07 GMT+01:00 Sebastian Kuepers
:
> Hey,
>
> with collect() RDDs elements are send as a list back to the driver.
>
> If have a 4 node cluster (based on Mesos) in a datacenter and I have my
> local dev machine.
>
> I work with a small 200MB dataset just for testing during development right
> now.
>
> The collect() tasks are running for times faster on my local machine, than
> on the cluster, although it actually uses 4x the number of cores etc.
>
> It's 7 seconds locally and 28 seconds on the cluster for the same collect()
> job.
>
> What's the reason for that? Is that just network latency sending back the
> data to the driver within the cluster? (well it's just this 200MB in total)
>
> Is that somehow a kind of 'management overhead' form Mesos?
>
> Appreciate any thoughts an possible impacts for that!
Serialization and sending over network takes time, way more than
simply processing the data on the same machine. But it doesn't scale
as well. Try with more data and plot the results.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Support Ordering on UserDefinedType

2015-11-03 Thread Simon Hafner

2015-11-03 23:20 GMT+01:00 Ionized :
> TypeUtils.getInterpretedOrdering currently only supports AtomicType and
> StructType. Is it possible to add support for UserDefinedType as well?
Yes, make a PR to spark.

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala#L57-L62

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

[jira] [Created] (SPARK-11268) Non-daemon startup scripts

2015-10-22 Thread Simon Hafner (JIRA)

Simon Hafner created SPARK-11268:


 Summary: Non-daemon startup scripts
 Key: SPARK-11268
 URL: https://issues.apache.org/jira/browse/SPARK-11268
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Reporter: Simon Hafner


The current submit scripts fork and write the logs to /var/log/spark. It would 
be nice to have an option to just exec the process and log to stdout, so 
systemd can collect the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Compiling spark 1.5.1 fails with scala.reflect.internal.Types$TypeError: bad symbolic reference.

2015-10-16 Thread Simon Hafner

Fresh clone of spark 1.5.1, java version "1.7.0_85"

build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package

[error] bad symbolic reference. A signature in WebUI.class refers to
term eclipse
[error] in package org which is not available.
[error] It may be completely missing from the current classpath, or
the version on
[error] the classpath might be incompatible with the version used when
compiling WebUI.class.
[error] bad symbolic reference. A signature in WebUI.class refers to term jetty
[error] in value org.eclipse which is not available.
[error] It may be completely missing from the current classpath, or
the version on
[error] the classpath might be incompatible with the version used when
compiling WebUI.class.
[error]
[error]  while compiling:
/root/spark/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
[error] during phase: erasure
[error]  library version: version 2.10.4
[error] compiler version: version 2.10.4
[error]   reconstructed args: -deprecation -classpath
/root/spark/sql/core/target/scala-2.10/classes:/root/.m2/repository/org/apache/spark/spark-core_2.10/1.6.0-SNAPSHOT/spark-core_2.10-1.6.0-SNAPSHOT.jar:/root/.m
2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/root/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/root/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.
7-tests.jar:/root/.m2/repository/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar:/root/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/root/.m2/repository/com/esotericsoftware/reflectasm/reflec
tasm/1.07/reflectasm-1.07-shaded.jar:/root/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/root/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/root/.m2/repository/org/apach
e/hadoop/hadoop-client/2.4.0/hadoop-client-2.4.0.jar:/root/.m2/repository/org/apache/hadoop/hadoop-common/2.4.0/hadoop-common-2.4.0.jar:/root/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/root/.m
2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/root/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/root/.m2/repository/commons-collections/commons-collections/3.2.1/commons-
collections-3.2.1.jar:/root/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/root/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/root/.m
2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/root/.m2/repository/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/root/.m2/repository/org/apac
he/hadoop/hadoop-auth/2.4.0/hadoop-auth-2.4.0.jar:/root/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.4.0/hadoop-hdfs-2.4.0.jar:/root/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/root
/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-app/2.4.0/hadoop-mapreduce-client-app-2.4.0.jar:/root/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-common/2.4.0/hadoop-mapreduce-client-common-
2.4.0.jar:/root/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.4.0/hadoop-yarn-client-2.4.0.jar:/root/.m2/repository/com/sun/jersey/jersey-client/1.9/jersey-client-1.9.jar:/root/.m2/repository/org/apache/ha
doop/hadoop-yarn-server-common/2.4.0/hadoop-yarn-server-common-2.4.0.jar:/root/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.4.0/hadoop-mapreduce-client-shuffle-2.4.0.jar:/root/.m2/repository/
org/apache/hadoop/hadoop-yarn-api/2.4.0/hadoop-yarn-api-2.4.0.jar:/root/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-core/2.4.0/hadoop-mapreduce-client-core-2.4.0.jar:/root/.m2/repository/org/apache/ha
doop/hadoop-yarn-common/2.4.0/hadoop-yarn-common-2.4.0.jar:/root/.m2/repository/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/root/.m2/repository/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar:/root/.m2/re
pository/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.4.0/hadoop-mapreduce-client-jobclient-2.4.0.jar:/root/.m2/repository/org/apache/hadoop/hadoop-annotations/2.4.0/hadoop-annotations-2.4.0.jar:/root/.m2
/repository/org/apache/spark/spark-launcher_2.10/1.6.0-SNAPSHOT/spark-launcher_2.10-1.6.0-SNAPSHOT.jar:/root/.m2/repository/org/apache/spark/spark-network-common_2.10/1.6.0-SNAPSHOT/spark-network-common_2.10-1.6.0
-SNAPSHOT.jar:/root/.m2/repository/org/apache/spark/spark-network-shuffle_2.10/1.6.0-SNAPSHOT/spark-network-shuffle_2.10-1.6.0-SNAPSHOT.jar:/root/.m2/repository/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldb
jni-all-1.8.jar:/root/.m2/repository/org/apache/spark/spark-unsafe_2.10/1.6.0-SNAPSHOT/spark-unsafe_2.10-1.6.0-SNAPSHOT.jar:/root/.m2/repository/net/java/dev/jets3t/jets3t/0.9.3/jets3t-0.9.3.jar:/root/.m2/reposito

udaf with multiple return values in spark 1.5.0

2015-09-06 Thread Simon Hafner

Hi everyone

is it possible to return multiple values with an udaf defined in spark
1.5.0? The documentation [1] mentions

abstract def dataType: DataType
The DataType of the returned value of this UserDefinedAggregateFunction.

so it's only possible to return a single value. Should I use ArrayType
as a WA here? The returned values are all doubles.

Cheers

[1] 
https://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/api/scala/index.html#org.apache.spark.sql.expressions.UserDefinedAggregateFunction

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

[jira] [Created] (SPARK-10053) SparkR isn't exporting lapply

2015-08-17 Thread Simon Hafner (JIRA)

Simon Hafner created SPARK-10053:


 Summary: SparkR isn't exporting lapply
 Key: SPARK-10053
 URL: https://issues.apache.org/jira/browse/SPARK-10053
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.4.1
Reporter: Simon Hafner


SparkR isn't exporting lapply and lapplyPartition (anymore?). There is no other 
function exported to enable distributed calculations over DataFrames (except 
groupBy). https://spark.apache.org/docs/latest/api/R/index.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8821) The ec2 script doesn't run on python 3 with an utf8 env

2015-07-03 Thread Simon Hafner (JIRA)

Simon Hafner created SPARK-8821:
---

 Summary: The ec2 script doesn't run on python 3 with an utf8 env
 Key: SPARK-8821
 URL: https://issues.apache.org/jira/browse/SPARK-8821
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.4.0
 Environment: Archlinux, UTF8 LANG env
Reporter: Simon Hafner


Otherwise the script will crash with

 - Downloading boto...
Traceback (most recent call last):
  File ec2/spark_ec2.py, line 148, in module
setup_external_libs(external_libs)
  File ec2/spark_ec2.py, line 128, in setup_external_libs
if hashlib.md5(tar.read()).hexdigest() != lib[md5]:
  File /usr/lib/python3.4/codecs.py, line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid 
start byte

In case of an utf8 env setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: DUCC doesn't use all available machines

2014-11-30 Thread Simon Hafner

2014-11-30 7:25 GMT-06:00 Eddie Epstein eaepst...@gmail.com:
 On Sat, Nov 29, 2014 at 4:46 PM, Simon Hafner reactorm...@gmail.com wrote:

 I've thrown some numbers at it (doubling each) and it's running at
 comfortable 125 procs. However, at about 6.1k of 6.5k items, the procs
 drop down to 30.


 125 processes at 8 threads each = 1000 active pipelines. How CPU cores
 are these 1000 pipelines running on?
Only 60.

Re: DUCC org.apache.uima.util.InvalidXMLException and no logs

2014-11-28 Thread Simon Hafner

2014-11-27 11:44 GMT-06:00 Eddie Epstein eaepst...@gmail.com:
 Those are the only two log files? Should be a ducc.log (probably with no
 more info than on the console), and either one or both of the job driver
 logfiles: jd.out.log and jobid-JD-jdnode-jdpid.log. If for some reason the
 job driver failed to start, check the job driver agent log (the agent
 managing the System/JobDriver reservation) for more info on what happened.

The job driver logs do not exist. I rebooted the machine and now it
works. I'll take a look at the agent log next time.

Ducc: Rename failed

2014-11-28 Thread Simon Hafner

When running DUCC in cluster mode, I get Rename failed. The file
mentioned in the error message exists in the txt.processed/ directory.
The mount is via nfs (rw,sync,insecure).

org.apache.uima.resource.ResourceProcessException: Received Exception
In Message From Service on Queue:ducc.jd.queue.75 Broker:
tcp://10.0.0.164:61617?jms.useCompression=true Cas
Identifier:18acd63:149f6f562d3:-7fa6 Exception:{3}
at 
org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendAndReceiveCAS(BaseUIMAAsynchronousEngineCommon_impl.java:2230)
at 
org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendAndReceiveCAS(BaseUIMAAsynchronousEngineCommon_impl.java:2049)
at org.apache.uima.ducc.jd.client.WorkItem.run(WorkItem.java:145)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.uima.aae.error.UimaEEServiceException:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
at 
org.apache.uima.adapter.jms.activemq.JmsOutputChannel.sendReply(JmsOutputChannel.java:932)
at 
org.apache.uima.aae.controller.BaseAnalysisEngineController.handleAction(BaseAnalysisEngineController.java:1172)
at 
org.apache.uima.aae.controller.PrimitiveAnalysisEngineController_impl.takeAction(PrimitiveAnalysisEngineController_impl.java:1145)
at 
org.apache.uima.aae.error.handler.ProcessCasErrorHandler.handleError(ProcessCasErrorHandler.java:405)
at 
org.apache.uima.aae.error.ErrorHandlerChain.handle(ErrorHandlerChain.java:57)
at 
org.apache.uima.aae.controller.PrimitiveAnalysisEngineController_impl.process(PrimitiveAnalysisEngineController_impl.java:1065)
at 
org.apache.uima.aae.handler.HandlerBase.invokeProcess(HandlerBase.java:121)
at 
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handleProcessRequestFromRemoteClient(ProcessRequestHandler_impl.java:543)
at 
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handle(ProcessRequestHandler_impl.java:1050)
at 
org.apache.uima.aae.handler.input.MetadataRequestHandler_impl.handle(MetadataRequestHandler_impl.java:78)
at 
org.apache.uima.adapter.jms.activemq.JmsInputChannel.onMessage(JmsInputChannel.java:728)
at 
org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:535)
at 
org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:495)
at 
org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:467)
at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:325)
at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:263)
at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1058)
at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:952)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at 
org.apache.uima.aae.UimaAsThreadFactory$1.run(UimaAsThreadFactory.java:129)
... 1 more
Caused by: org.apache.uima.analysis_engine.AnalysisEngineProcessException
at org.apache.uima.ducc.sampleapps.DuccCasCC.process(DuccCasCC.java:117)
at 
org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385)
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:309)
at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:569)
at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.init(ASB_impl.java:411)
at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:344)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:266)
at

Re: Ducc: Rename failed

2014-11-28 Thread Simon Hafner

2014-11-28 14:18 GMT-06:00 Eddie Epstein eaepst...@gmail.com:
 To debug, please add the following option to the job submission:
 --all_in_one local

 This will run all the code in a single process on the machine doing the
 submit. Hopefully the log file and/or console will be more informative.
Yes, that helped. It was a missing classpath.

DUCC org.apache.uima.util.InvalidXMLException and no logs

2014-11-26 Thread Simon Hafner

When launching the Raw Text example application, it doesn't load with
the following error:

[ducc@ip-10-0-0-164 analysis]$ MyAppDir=$PWD MyInputDir=$PWD/txt
MyOutputDir=$PWD/txt.processed ~/ducc_install/bin/ducc_submit -f
DuccRawTextSpec.job
Job 50 submitted
id:50 location:5991@ip-10-0-0-164
id:50 state:WaitingForDriver
id:50 state:Completing total:-1 done:0 error:0 retry:0 procs:0
id:50 state:Completed total:-1 done:0 error:0 retry:0 procs:0
id:50 rationale:job driver exception occurred:
org.apache.uima.util.InvalidXMLException at
org.apache.uima.ducc.common.uima.UimaUtils.getXMLInputSource(UimaUtils.java:246)

However, there are no logs with a stacktrace or similar, how do I get
hold of one? The only files in the log directory are:

[ducc@ip-10-0-0-164 analysis]$ cat logs/50/specified-by-user.properties
#Thu Nov 27 03:00:57 UTC 2014
working_directory=/home/ducc/analysis
process_descriptor_CM=org.apache.uima.ducc.sampleapps.DuccTextCM
driver_descriptor_CR=org.apache.uima.ducc.sampleapps.DuccJobTextCR
cancel_on_interrupt=
process_descriptor_CC_overrides=UseBinaryCompression\=true
process_descriptor_CC=org.apache.uima.ducc.sampleapps.DuccCasCC
log_directory=/home/ducc/analysis/logs
wait_for_completion=
classpath=/home/ducc/analysis/lib/*
process_thread_count=8
driver_descriptor_CR_overrides=BlockSize\=10 SendToLast\=true
InputDirectory\=/home/ducc/analysis/txt
OutputDirectory\=/home/ducc/analysis/txt.processed
process_per_item_time_max=20
process_descriptor_AE=/home/ducc/analysis/opennlp.uima.OpenNlpTextAnalyzer/opennlp.uima.OpenNlpTextAnalyzer_pear.xml
description=DUCC raw text sample application
process_jvm_args=-Xmx3G -XX\:+UseCompressedOops
-Djava.util.logging.config.file\=/home/ducc/analysis/ConsoleLogger.properties
scheduling_class=normal
process_memory_size=4
specification=DuccRawTextSpec.job

[ducc@ip-10-0-0-164 analysis]$ cat logs/50/job-specification.properties
#Thu Nov 27 03:00:57 UTC 2014
working_directory=/home/ducc/analysis
process_descriptor_CM=org.apache.uima.ducc.sampleapps.DuccTextCM
process_failures_limit=20
driver_descriptor_CR=org.apache.uima.ducc.sampleapps.DuccJobTextCR
cancel_on_interrupt=
process_descriptor_CC_overrides=UseBinaryCompression\=true
process_descriptor_CC=org.apache.uima.ducc.sampleapps.DuccCasCC
classpath_order=ducc-before-user
log_directory=/home/ducc/analysis/logs
submitter_pid_at_host=5991@ip-10-0-0-164
wait_for_completion=
classpath=/home/ducc/analysis/lib/*
process_thread_count=8
driver_descriptor_CR_overrides=BlockSize\=10 SendToLast\=true
InputDirectory\=/home/ducc/analysis/txt
OutputDirectory\=/home/ducc/analysis/txt.processed
process_initialization_failures_cap=99
process_per_item_time_max=20
process_descriptor_AE=/home/ducc/analysis/opennlp.uima.OpenNlpTextAnalyzer/opennlp.uima.OpenNlpTextAnalyzer_pear.xml
description=DUCC raw text sample application
process_jvm_args=-Xmx3G -XX\:+UseCompressedOops
-Djava.util.logging.config.file\=/home/ducc/analysis/ConsoleLogger.properties
scheduling_class=normal
environment=HOME\=/home/ducc LANG\=en_US.UTF-8 USER\=ducc
process_memory_size=4
user=ducc
specification=DuccRawTextSpec.job

wholeTextFiles on 20 nodes

2014-11-23 Thread Simon Hafner

I have 20 nodes via EC2 and an application that reads the data via
wholeTextFiles. I've tried to copy the data into hadoop via
copyFromLocal, and I get

14/11/24 02:00:07 INFO hdfs.DFSClient: Exception in
createBlockOutputStream 172.31.2.209:50010 java.io.IOException: Bad
connect ack with firstBadLink as X:50010
14/11/24 02:00:07 INFO hdfs.DFSClient: Abandoning block
blk_-8725559184260876712_2627
14/11/24 02:00:07 INFO hdfs.DFSClient: Excluding datanode X:50010

a lot. Then I went the file system route via copy-dir, which worked
well. Now everything is under /root/txt on all nodes. I submitted the
job with the file:///root/txt/ directory for wholeTextFiles() and I
get

Exception in thread main java.io.FileNotFoundException: File does
not exist: /root/txt/3521.txt

The file exists on the root note and should be everywhere according to
copy-dir. The hadoop variant worked fine with 3 nodes, but it starts
bugging with 20. I added

  property
namedfs.datanode.max.transfer.threads/name
value4096/value
  /property

to hdfs-site.xml and core-site.xml, didn't help.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: DUCC 1.1.0- How to Run two DUCC version on same machines with different user

2014-11-17 Thread Simon Hafner

2014-11-17 0:00 GMT-06:00 reshu.agarwal reshu.agar...@orkash.com:
 I want to run two DUCC version i.e. 1.0.0 and 1.1.0 on same machines with
 different user. Can this be possible?
Yes, that should be possible. You'll have to make sure there are no
port conflicts, I'd guess the ActiveMQ port is hardcoded, the rest
might be randomly assigned. Just set that port manually and watch out
for any errors during the start to see which other components have
hardcoded port numbers.

Personally, I'd just fire up a VM with qemu or VirtualBox.

Re: DUCC stuck at WaitingForResources on an Amazon Linux

2014-11-13 Thread Simon Hafner

 PM, Simon Hafner reactorm...@gmail.com wrote:

 4 shares total, 2 in use.

 2014-11-12 5:06 GMT-06:00 Lou DeGenaro lou.degen...@gmail.com:
  Try looking at your DUCC's web server.  On the System - Machines page
  do you see any shares not inuse?
 
  Lou.
 
  On Wed, Nov 12, 2014 at 5:51 AM, Simon Hafner reactorm...@gmail.com
 wrote:
  I've set up DUCC according to
  https://cwiki.apache.org/confluence/display/UIMA/DUCC
 
  ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job
 
  the job is stuck at WaitingForResources.
 
  12 Nov 2014 10:37:30,175  INFO Agent.LinuxNodeMetricsProcessor -
  process N/A ... Agent Collecting User Processes
  12 Nov 2014 10:37:30,176  INFO Agent.NodeAgent -
  copyAllUserReservations N/A +++ Copying User Reservations
  - List Size:0
  12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
 N/A ** User Process Map Size After
  copyAllUserReservations:0
  12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
 N/A ** User Process Map Size After
  copyAllUserRougeProcesses:0
  12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
N/A
  12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
 N/A
 **
  12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor -
  process N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
  Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
  Low Swap Threshold Defined in ducc.properties:0
  12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
  reportIncomingStateForThisNode N/A Received OR Sequence:699 Thread
  ID:13
  12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
  reportIncomingStateForThisNode N/A
  JD-- JobId:6 ProcessId:0 PID:8168 Status:Running Resource
  State:Allocated isDeallocated:false
  12 Nov 2014 10:37:33,303  INFO Agent.NodeAgent - setReservations
  N/A +++ Copied User Reservations - List Size:0
  12 Nov 2014 10:37:33,405  INFO
  Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
   N/A PID:8168 Swap Usage:0
  12 Nov 2014 10:37:33,913  INFO
  Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
  collectProcessCurrentCPU N/A 0.0 == CPUTIME:0.0
  12 Nov 2014 10:37:33,913  INFO
  Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process N/A
  --- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
  Usage Allowed:-108574720 Time to Collect Swap Usage:0
 
  I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
  Linux (looks centos based).
 
  To install maven (not in the repos)
 
  #! /bin/bash
 
  TEMPORARY_DIRECTORY=$(mktemp -d)
  DOWNLOAD_TO=$TEMPORARY_DIRECTORY/maven.tgz
 
  echo 'Downloading Maven to: ' $DOWNLOAD_TO
 
  wget -O $DOWNLOAD_TO
 
 http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz
 
  echo 'Extracting Maven'
  tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
  rm $DOWNLOAD_TO
 
  echo 'Configuring Envrionment'
 
  mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
  echo -e 'export M2_HOME=/usr/local/maven\nexport
  PATH=${M2_HOME}/bin:${PATH}'  /etc/profile.d/maven.sh
  source /etc/profile.d/maven.sh
 
  echo 'The maven version: ' `mvn -version` ' has been installed.'
  echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
  echo 'Removing the temporary directory...'
  rm -r $TEMPORARY_DIRECTORY
  echo 'Your Maven Installation is Complete.'

DUCC stuck at WaitingForResources on an Amazon Linux

2014-11-12 Thread Simon Hafner

I've set up DUCC according to
https://cwiki.apache.org/confluence/display/UIMA/DUCC

ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job

the job is stuck at WaitingForResources.

12 Nov 2014 10:37:30,175  INFO Agent.LinuxNodeMetricsProcessor -
process N/A ... Agent Collecting User Processes
12 Nov 2014 10:37:30,176  INFO Agent.NodeAgent -
copyAllUserReservations N/A +++ Copying User Reservations
- List Size:0
12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A ** User Process Map Size After
copyAllUserReservations:0
12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A ** User Process Map Size After
copyAllUserRougeProcesses:0
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call N/A
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A 
**
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor -
process N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
Low Swap Threshold Defined in ducc.properties:0
12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
reportIncomingStateForThisNode N/A Received OR Sequence:699 Thread
ID:13
12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
reportIncomingStateForThisNode N/A
JD-- JobId:6 ProcessId:0 PID:8168 Status:Running Resource
State:Allocated isDeallocated:false
12 Nov 2014 10:37:33,303  INFO Agent.NodeAgent - setReservations
N/A +++ Copied User Reservations - List Size:0
12 Nov 2014 10:37:33,405  INFO
Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
 N/A PID:8168 Swap Usage:0
12 Nov 2014 10:37:33,913  INFO
Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
collectProcessCurrentCPU N/A 0.0 == CPUTIME:0.0
12 Nov 2014 10:37:33,913  INFO
Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process N/A
--- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
Usage Allowed:-108574720 Time to Collect Swap Usage:0

I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
Linux (looks centos based).

To install maven (not in the repos)

#! /bin/bash

TEMPORARY_DIRECTORY=$(mktemp -d)
DOWNLOAD_TO=$TEMPORARY_DIRECTORY/maven.tgz

echo 'Downloading Maven to: ' $DOWNLOAD_TO

wget -O $DOWNLOAD_TO
http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz

echo 'Extracting Maven'
tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
rm $DOWNLOAD_TO

echo 'Configuring Envrionment'

mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
echo -e 'export M2_HOME=/usr/local/maven\nexport
PATH=${M2_HOME}/bin:${PATH}'  /etc/profile.d/maven.sh
source /etc/profile.d/maven.sh

echo 'The maven version: ' `mvn -version` ' has been installed.'
echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
echo 'Removing the temporary directory...'
rm -r $TEMPORARY_DIRECTORY
echo 'Your Maven Installation is Complete.'

Re: DUCC stuck at WaitingForResources on an Amazon Linux

2014-11-12 Thread Simon Hafner

4 shares total, 2 in use.

2014-11-12 5:06 GMT-06:00 Lou DeGenaro lou.degen...@gmail.com:
 Try looking at your DUCC's web server.  On the System - Machines page
 do you see any shares not inuse?

 Lou.

 On Wed, Nov 12, 2014 at 5:51 AM, Simon Hafner reactorm...@gmail.com wrote:
 I've set up DUCC according to
 https://cwiki.apache.org/confluence/display/UIMA/DUCC

 ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job

 the job is stuck at WaitingForResources.

 12 Nov 2014 10:37:30,175  INFO Agent.LinuxNodeMetricsProcessor -
 process N/A ... Agent Collecting User Processes
 12 Nov 2014 10:37:30,176  INFO Agent.NodeAgent -
 copyAllUserReservations N/A +++ Copying User Reservations
 - List Size:0
 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
N/A ** User Process Map Size After
 copyAllUserReservations:0
 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
N/A ** User Process Map Size After
 copyAllUserRougeProcesses:0
 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call N/A
 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
N/A 
 **
 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor -
 process N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
 Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
 Low Swap Threshold Defined in ducc.properties:0
 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
 reportIncomingStateForThisNode N/A Received OR Sequence:699 Thread
 ID:13
 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
 reportIncomingStateForThisNode N/A
 JD-- JobId:6 ProcessId:0 PID:8168 Status:Running Resource
 State:Allocated isDeallocated:false
 12 Nov 2014 10:37:33,303  INFO Agent.NodeAgent - setReservations
 N/A +++ Copied User Reservations - List Size:0
 12 Nov 2014 10:37:33,405  INFO
 Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
  N/A PID:8168 Swap Usage:0
 12 Nov 2014 10:37:33,913  INFO
 Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
 collectProcessCurrentCPU N/A 0.0 == CPUTIME:0.0
 12 Nov 2014 10:37:33,913  INFO
 Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process N/A
 --- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
 Usage Allowed:-108574720 Time to Collect Swap Usage:0

 I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
 Linux (looks centos based).

 To install maven (not in the repos)

 #! /bin/bash

 TEMPORARY_DIRECTORY=$(mktemp -d)
 DOWNLOAD_TO=$TEMPORARY_DIRECTORY/maven.tgz

 echo 'Downloading Maven to: ' $DOWNLOAD_TO

 wget -O $DOWNLOAD_TO
 http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz

 echo 'Extracting Maven'
 tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
 rm $DOWNLOAD_TO

 echo 'Configuring Envrionment'

 mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
 echo -e 'export M2_HOME=/usr/local/maven\nexport
 PATH=${M2_HOME}/bin:${PATH}'  /etc/profile.d/maven.sh
 source /etc/profile.d/maven.sh

 echo 'The maven version: ' `mvn -version` ' has been installed.'
 echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
 echo 'Removing the temporary directory...'
 rm -r $TEMPORARY_DIRECTORY
 echo 'Your Maven Installation is Complete.'

log4j logging control via sbt

2014-11-05 Thread Simon Hafner

I've tried to set the log4j logger to warn only via log4j properties file in

cat src/test/resources/log4j.properties
log4j.logger.org.apache.spark=WARN

or in sbt via

javaOptions += -Dlog4j.logger.org.apache.spark=WARN

But the logger still gives me INFO messages to stdout when I run my tests via

sbt test

Is it the wrong option? I also tried

javaOptions += -Dlog4j.rootLogger=warn

but that doesn't seem to help either.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark with HLists

2014-10-29 Thread Simon Hafner

I tried using shapeless HLists as data storage for data inside spark.
Unsurprisingly, it failed. The deserialization isn't well-defined because of
all the implicits used by shapeless. How could I make it work?

Sample Code:

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import shapeless._
import ops.hlist._

object SimpleApp {
  def main(args: Array[String]) {
val logFile = /tmp/README.md // Should be some file on your system
val conf = new SparkConf().setAppName(Simple Application)
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData
  .map(line = line :: HNil)
  .filter(_.select[String].contains(a))
  .count()
println(Lines with a: %s.format(numAs))
  }
}

Error:

Exception in thread main java.lang.NoClassDefFoundError:
shapeless/$colon$colon
at SimpleApp$.main(SimpleApp.scala:15)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

[kde] Add certificates to Kopete

2014-04-15 Thread Simon Hafner

Hello

how do I add root certificates to kopete? Add added ca-cert to both kleopatra 
and the system ca-certificates, but kopete still complains about an invalid SSL 
certificate from the server due to invalid root certificate.

Cheers,
Simon
___
This message is from the kde mailing list.
Account management:  https://mail.kde.org/mailman/listinfo/kde.
Archives: http://lists.kde.org/.
More info: http://www.kde.org/faq.html.

[opensc-devel] pam_p11 (without pin) and ssh (with pin) on one card

2012-09-25 Thread Simon Hafner

Hey y'all

I have an ePass2003, and I'd like to use it for pam_p11 and ssh. The
pam_p11 key should be usable without a pin, or can I provide the pin
by using the password field? I'd like to know which paths are
possible. The other object stored is an ssh key secured by a pin.

My problem is now that I initialize my card with

pkcs15-init --create-pkcs15 --profile pkcs15+onepin

I only have one pin, but I'd like to have two auth-ids, one with and
one without pin.

-- Simon
___
opensc-devel mailing list
opensc-devel@lists.opensc-project.org
http://www.opensc-project.org/mailman/listinfo/opensc-devel

More specific logging than xev

2012-08-15 Thread Simon Hafner

Hey

I'm implementing some EWMH with emacs. There is a function
(x-send-client-message) to send X messages. According to xev -root,
the root window receives

ClientMessage event, serial 20, synthetic YES, window 0x2200012,
message_type 0x13f (_NET_CURRENT_DESKTOP), format 32

but does not switch desktop. `wmctrl -s 1` gives the same message

ClientMessage event, serial 26, synthetic YES, window 0xab,
message_type 0x13f (_NET_CURRENT_DESKTOP), format 32

but it switches desktops. How can I debug that further?

Greetings
Simon Hafner
___
wm-spec-list mailing list
wm-spec-list@gnome.org
https://mail.gnome.org/mailman/listinfo/wm-spec-list

Re: Contextual modes for operators.

2012-08-15 Thread Simon Hafner

2012/8/15 Andreas Liljeqvist bon...@gmail.com:
 I am not really familiar with elisp though, what are the equivalence of
 multimethods as found in clojure for example?

According to #emacs, there is eieio, if I you associate multimethods
with dynamic dispatch.

 cW stopping in the case of structure(parenthesis etc) when in lispy-modes.

I'm not sure, this might give some WTF situations. parenthesis
movement is covered by `%`. But making `%` mode-aware to move between
e.g. `do` and `end` in ruby-mode would be awesome.

 gd Using whatever we have available for definition.

semantic-mode might help here.

 i Going to the first available actual input-area. (mostly for repls)

Good idea.

___
implementations-list mailing list
implementations-list@lists.ourproject.org
https://lists.ourproject.org/cgi-bin/mailman/listinfo/implementations-list

Re: [Voyage-linux] Can't boot voyage linux 0.8.5

2012-08-14 Thread Simon Hafner

liquid liquidee@... writes:
 I am trying to install voyage linux 0.8.5 on an Alix 3d2 board via pxe 
 boot. I can see that it loads the image properly, but the boot process 
 hangs on Switching to clocksource tsc. [...]

I've got the same problem with voyage 0.8.5, I'd appreciate help.


___
Voyage-linux mailing list
Voyage-linux@list.voyage.hk
http://list.voyage.hk/mailman/listinfo/voyage-linux

[Moses-support] extract translation table

2012-06-03 Thread Simon Hafner

Hi all

is there a nice way to get the top 100 translations?

I'm trying to compare two languages on character ngram level, to find
common edit paths. The idea is to train moses for that pair and then
extract the most common ngram pairs. Is this even possible or are they
normalized based on their occurrence?

-- Simon
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: lucene-core-3.3.0 not optimizing

2011-12-02 Thread Simon Hafner


On 02.12.2011, at 04:54, KARTHIK SHIVAKUMAR wrote:

 Hi
 
 Spec
 O/s win os 7
 Jdk : 1.6.0_29
 Lucene  lucene-core-3.3.0
 
 Finally after Indexing successfully ,Why this Code does not optimize (
 sample code )
 
INDEX_WRITER.optimize(100);
INDEX_WRITER.commit();
INDEX_WRITER.close();

Because you shouldn't use it [1] in general. :-)

Simon

[1‘ https://issues.apache.org/jira/browse/LUCENE-3454
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: [rspec-users] RSpec newbie ran rspec 2.4 on WinXP, got HOME undefined

2011-01-10 Thread Simon Hafner

On Monday 10 January 2011 22.37:46 RichardOnRails wrote:
 Hi,
 
 I'm running WinXP-Pro/SP3  Ruby 1.8.6
 
 K:/_Utilities/ruby186-26_rc2/ruby/lib/ruby/gems/1.8/gems/rspec-
 core-2.4.0/lib/rspec/core/configuration_options.rb:9:couldn't find
 HOME environment -- expanding `~' (ArgumentError)
 
 I checked my environment variables for HOME; it is undefined.

Sounds like rspec is relying on some *NIX stuff here (HOME is usually pointing 
to a users home directory). Google how to set environment variables and set 
HOME to the folder containing your home directory (usually something like 
C:\documents and settings\user).

 Any advice would be gratefully received.
 --
 Richard
___
rspec-users mailing list
rspec-users@rubyforge.org
http://rubyforge.org/mailman/listinfo/rspec-users

45 matches

Mail list logo