Re: Problems concerning implementing machine learning algorithm from scratch based on Spark

2014-12-30 Thread MEETHU MATHEW
Hi, The GMMSpark.py you mentioned is the old one.The new code is now added to spark-packages and is available at http://spark-packages.org/package/11 . Have a look at the new code. We have used numpy functions in our code and didnt notice any slowdown because of this. Thanks Regards, Meethu M

Re: Registering custom metrics

2014-12-30 Thread eshioji
Hi, Did you find a way to do this / working on this? Am trying to find a way to do this as well, but haven't been able to find a way. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Registering-custom-metrics-tp9030p9968.html Sent from the Apache

Is there any way to tell if compute is being called from a retry?

2014-12-30 Thread Cody Koeninger
It looks like taskContext.attemptId doesn't mean what one thinks it might mean, based on http://apache-spark-developers-list.1001551.n3.nabble.com/Get-attempt-number-in-a-closure-td8853.html and the unresolved https://issues.apache.org/jira/browse/SPARK-4014 Is there any alternative way to

Re: Is there any way to tell if compute is being called from a retry?

2014-12-30 Thread Josh Rosen
This is timely, since I just ran into this issue myself while trying to write a test to reproduce a bug related to speculative execution (I wanted to configure a job so that the first attempt to compute a partition would run slow so that a second, fast speculative copy would be launched). I've

Re: Unsupported Catalyst types in Parquet

2014-12-30 Thread Alessandro Baretta
Gents, I tried #3820. It doesn't work. I'm still getting the following exceptions: Exception in thread Thread-45 java.lang.RuntimeException: Unsupported datatype DateType at scala.sys.package$.error(package.scala:27) at

Re: Unsupported Catalyst types in Parquet

2014-12-30 Thread Alessandro Baretta
Sorry! My bad. I had stale spark jars sitting on the slave nodes... Alex On Tue, Dec 30, 2014 at 4:39 PM, Alessandro Baretta alexbare...@gmail.com wrote: Gents, I tried #3820. It doesn't work. I'm still getting the following exceptions: Exception in thread Thread-45

Re: Adding third party jars to classpath used by pyspark

2014-12-30 Thread Davies Liu
On Mon, Dec 29, 2014 at 7:39 PM, Jeremy Freeman freeman.jer...@gmail.com wrote: Hi Stephen, it should be enough to include --jars /path/to/file.jar in the command line call to either pyspark or spark-submit, as in spark-submit --master local --jars /path/to/file.jar myfile.py

Re: Help, pyspark.sql.List flatMap results become tuple

2014-12-30 Thread Davies Liu
This should be fixed in 1.2, could you try it? On Mon, Dec 29, 2014 at 8:04 PM, guoxu1231 guoxu1...@gmail.com wrote: Hi pyspark guys, I have a json file, and its struct like below: {NAME:George, AGE:35, ADD_ID:1212, POSTAL_AREA:1, TIME_ZONE_ID:1, INTEREST:[{INTEREST_NO:1, INFO:x},

Re: Help, pyspark.sql.List flatMap results become tuple

2014-12-30 Thread guoxu1231
Thanks Davies, it works in 1.2. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tuple-tp9961p9975.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Why the major.minor version of the new hive-exec is 51.0?

2014-12-30 Thread Shixiong Zhu
The major.minor version of the new org.spark-project.hive.hive-exec is 51.0, so it will require people use JDK7. Is it intentional? dependency groupIdorg.spark-project.hive/groupId artifactIdhive-exec/artifactId version0.12.0-protobuf-2.5/version /dependency You can use the following steps to

Sample Spark Program Error

2014-12-30 Thread Naveen Madhire
Hi All, I am trying to run a sample Spark program using Scala SBT, Below is the program, def main(args: Array[String]) { val logFile = E:/ApacheSpark/usb/usb/spark/bin/README.md // Should be some file on your system val sc = new SparkContext(local, Simple App,

Re: Why the major.minor version of the new hive-exec is 51.0?

2014-12-30 Thread Ted Yu
I extracted org/apache/hadoop/hive/common/CompressionUtils.class from the jar and used hexdump to view the class file. Bytes 6 and 7 are 00 and 33, respectively. According to http://en.wikipedia.org/wiki/Java_class_file, the jar was produced using Java 7. FYI On Tue, Dec 30, 2014 at 8:09 PM,

Re: Sample Spark Program Error

2014-12-30 Thread Nicholas Chammas
You sent this to the dev list. Please send it instead to the user list. We use the dev list to discuss development on Spark itself, new features, fixes to known bugs, and so forth. The user list is to discuss issues using Spark, which I believe is what you are looking for. Nick On Tue Dec 30

Re: Unsupported Catalyst types in Parquet

2014-12-30 Thread Alessandro Baretta
Here's a more meaningful exception: java.lang.ClassCastException: org.apache.spark.sql.catalyst.types.DateType$ cannot be cast to org.apache.spark.sql.catalyst.types.PrimitiveType at org.apache.spark.sql.parquet.RowWriteSupport.writeValue(ParquetTableSupport.scala:188) at

Re: Unsupported Catalyst types in Parquet

2014-12-30 Thread Alessandro Baretta
I think I might have figure it out myself. Here's a pull request for you guys to check out: https://github.com/apache/spark/pull/3855 I successfully tested this code on my cluster. On Tue, Dec 30, 2014 at 11:01 PM, Alessandro Baretta alexbare...@gmail.com wrote: Here's a more meaningful