Re: What's the use of RangePartitioner.hashCode

2016-09-22 Thread Jakob Odersky
Hash codes should try to avoid collisions of objects that are not equal. Integer overflowing is not an issue by itself On Wed, Sep 21, 2016 at 10:49 PM, WangJianfei wrote: > Than you very much sir! but what i want to know is whether the hashcode > overflow will

Re: Apache Spark JavaRDD pipe() need help

2016-09-21 Thread Jakob Odersky
Can you provide more details? It's unclear what you're asking On Wed, Sep 21, 2016 at 10:14 AM, shashikant.kulka...@gmail.com wrote: > Hi All, > > I am trying to use the JavaRDD.pipe() API. > > I have one object with me from the JavaRDD

Re: What's the use of RangePartitioner.hashCode

2016-09-21 Thread Jakob Odersky
t a.hashCode == b.hashCode when > a.equals(b), the bidirectional case is usually harder to satisfy due to > possibility of collisions. > > Good info: > http://www.programcreek.com/2011/07/java-equals-and-hashcode-contract/ > _____ > From: Jakob Odersky <

Re: What's the use of RangePartitioner.hashCode

2016-09-21 Thread Jakob Odersky
Hi, It is used jointly with a custom implementation of the `equals` method. In Scala, you can override the `equals` method to change the behaviour of `==` comparison. On example of this would be to compare classes based on their parameter values (i.e. what case classes do). Partitioners aren't

Re: Has anyone installed the scala kernel for Jupyter notebook

2016-09-21 Thread Jakob Odersky
One option would be to use Apache Toree. A quick setup guide can be found here https://toree.incubator.apache.org/documentation/user/quick-start On Wed, Sep 21, 2016 at 2:02 PM, Arif,Mubaraka wrote: > Has anyone installed the scala kernel for Jupyter notebook. > > > > Any

Re: Task Deserialization Error

2016-09-21 Thread Jakob Odersky
Your app is fine, I think the error has to do with the way inttelij launches applications. Is your app forked in a new jvm when you run it? On Wed, Sep 21, 2016 at 2:28 PM, Gokula Krishnan D wrote: > Hello Sumit - > > I could see that SparkConf() specification is not being

Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-21 Thread Jakob Odersky
Hi Xiang, this error also appears in client mode (maybe the situation that you were referring to and that worked was local mode?), however the error is expected and is not a bug. this line in your snippet: object Main extends A[String] { //... is, after desugaring, equivalent to: object

[jira] [Commented] (SPARK-16264) Allow the user to use operators on the received DataFrame

2016-09-15 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494587#comment-15494587 ] Jakob Odersky commented on SPARK-16264: --- I just came across this issue through a comment

Re: Can I assign affinity for spark executor processes?

2016-09-13 Thread Jakob Odersky
Hi Xiaoye, could it be that the executors were spawned before the affinity was set on the worker? Would it help to start spark worker with taskset from the beginning, i.e. "taskset [mask] start-slave.sh"? Workers in spark (standalone mode) simply create processes with the standard java process

Re: Test fails when compiling spark with tests

2016-09-13 Thread Jakob Odersky
There are some flaky tests that occasionally fail, my first recommendation would be to re-run the test suite. Another thing to check is if there are any applications listening to spark's default ports. Btw, what is your environment like? In case it is windows, I don't think tests are regularly run

[jira] [Commented] (SPARK-14221) Cross-publish Chill for Scala 2.12

2016-09-09 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478428#comment-15478428 ] Jakob Odersky commented on SPARK-14221: --- I just saw that chill already [has a pending PR to upgrade

Re: iterating over DataFrame Partitions sequentially

2016-09-09 Thread Jakob Odersky
st use-case of Spark though and will probably be a performance bottleneck. On Fri, Sep 9, 2016 at 11:45 AM, Jakob Odersky <ja...@odersky.com> wrote: > Hi Sujeet, > > going sequentially over all parallel, distributed data seems like a > counter-productive thing to do. What are you

Re: iterating over DataFrame Partitions sequentially

2016-09-09 Thread Jakob Odersky
Hi Sujeet, going sequentially over all parallel, distributed data seems like a counter-productive thing to do. What are you trying to accomplish? regards, --Jakob On Fri, Sep 9, 2016 at 3:29 AM, sujeet jog wrote: > Hi, > Is there a way to iterate over a DataFrame with n

[jira] [Comment Edited] (SPARK-14221) Cross-publish Chill for Scala 2.12

2016-09-09 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477903#comment-15477903 ] Jakob Odersky edited comment on SPARK-14221 at 9/9/16 6:30 PM: --- [~joshrosen

[jira] [Commented] (SPARK-14221) Cross-publish Chill for Scala 2.12

2016-09-09 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477903#comment-15477903 ] Jakob Odersky commented on SPARK-14221: --- [~joshrosen]'s upstream PR requires Kryo 3.1, a version

Re: @scala.annotation.varargs or @_root_.scala.annotation.varargs?

2016-09-08 Thread Jakob Odersky
+1 to Sean's answer, importing varargs. In this case the _root_ is also unnecessary (it would be required in case you were using it in a nested package called "scala" itself) On Thu, Sep 8, 2016 at 9:27 AM, Sean Owen wrote: > I think the @_root_ version is redundant because >

Re: Returning DataFrame as Scala method return type

2016-09-08 Thread Jakob Odersky
(Maybe unrelated FYI): in case you're using only Scala or Java with Spark, I would recommend to use Datasets instead of DataFrames. They provide exactly the same functionality, yet offer more type-safety. On Thu, Sep 8, 2016 at 11:05 AM, Lee Becker wrote: > > On Thu, Sep

[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-07 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471428#comment-15471428 ] Jakob Odersky commented on SPARK-17368: --- Hmm, you're right my assumption was of using only value

[jira] [Comment Edited] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-06 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468833#comment-15468833 ] Jakob Odersky edited comment on SPARK-17368 at 9/6/16 10:57 PM: So I

[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-06 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468833#comment-15468833 ] Jakob Odersky commented on SPARK-17368: --- So I thought about this a bit more and although

[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-02 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459707#comment-15459707 ] Jakob Odersky commented on SPARK-17368: --- Yeah macros would be awesome, something with Scala.meta

[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-02 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459587#comment-15459587 ] Jakob Odersky commented on SPARK-17368: --- I'm currently taking a look at this but my first analysis

Re: help getting started

2016-09-02 Thread Jakob Odersky
Hi Dayne, you can look at this page for some starter issues: https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20Starter%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened). Also check out this guide on how to contribute to Spark

Re: Dataset encoder for java.time.LocalDate?

2016-09-02 Thread Jakob Odersky
Spark currently requires at least Java 1.7, so adding a Java 1.8-specific encoder will not be straightforward without affecting requirements. I can think of two solutions: 1. add a Java 1.8 build profile which includes such encoders (this may be useful for Scala 2.12 support in the future as

[jira] [Commented] (SPARK-17367) Cannot define value classes in REPL

2016-09-02 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457862#comment-15457862 ] Jakob Odersky commented on SPARK-17367: --- You're absolutely correct, it is a Scala issue. I raised

Re: Scala Vs Python

2016-09-02 Thread Jakob Odersky
Forgot to answer your question about feature parity of Python w.r.t. Spark's different components I mostly work with scala so I can't say for sure but I think that all pre-2.0 features (that's basically everything except Structured Streaming) are on par. Structured Streaming is a pretty new

Re: Scala Vs Python

2016-09-02 Thread Jakob Odersky
As you point out, often the reason that Python support lags behind is that functionality is implemented in Scala, so the API in that language is "free" whereas Python support needs to be added explicitly. Nevertheless, Python bindings are an important part of Spark and is used by many people (this

[jira] [Comment Edited] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-01 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456966#comment-15456966 ] Jakob Odersky edited comment on SPARK-17368 at 9/1/16 11:48 PM: FYI

[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-01 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456966#comment-15456966 ] Jakob Odersky commented on SPARK-17368: --- FYI the issue also occurs for top-level value classes (i.e

Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

2016-09-01 Thread Jakob Odersky
I'm not sure how the shepherd thing works, but just FYI Michael Armbrust originally wrote Catalyst, the engine behind Datasets. You can find a list of all committers here https://cwiki.apache.org/confluence/display/SPARK/Committers. Another good resource is to check https://spark-prs.appspot.com/

Re: Scala Vs Python

2016-09-01 Thread Jakob Odersky
w>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>>

Re: Scala Vs Python

2016-09-01 Thread Jakob Odersky
> However, what really worries me is not having Dataset APIs at all in Python. I think thats a deal breaker. What is the functionality you are missing? In Spark 2.0 a DataFrame is just an alias for Dataset[Row] ("type DataFrame = Dataset[Row]" in core/.../o/a/s/sql/package.scala). Since python is

Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

2016-09-01 Thread Jakob Odersky
Hi Aris, thanks for sharing this issue. I can confirm that value classes currently don't work, however I can't think of reason why they shouldn't be supported. I would therefore recommend that you report this as a bug. (Btw, value classes also currently aren't definable in the REPL. See

[jira] [Created] (SPARK-17367) Cannot define value classes in REPL

2016-09-01 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-17367: - Summary: Cannot define value classes in REPL Key: SPARK-17367 URL: https://issues.apache.org/jira/browse/SPARK-17367 Project: Spark Issue Type: Bug

Re: How to use custom class in DataSet

2016-08-30 Thread Jakob Odersky
Implementing custom encoders is unfortunately not well supported at the moment (IIRC there are plans to eventually add an api for user defined encoders). That being said, there are a couple of encoders that can work with generic, serializable data types: "javaSerialization" and "kryo", found here

[jira] [Comment Edited] (SPARK-17103) Can not define class variable in repl

2016-08-17 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228 ] Jakob Odersky edited comment on SPARK-17103 at 8/17/16 5:28 PM: That's

[jira] [Comment Edited] (SPARK-17103) Can not define class variable in repl

2016-08-17 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228 ] Jakob Odersky edited comment on SPARK-17103 at 8/17/16 9:53 AM: That's

[jira] [Comment Edited] (SPARK-17103) Can not define class variable in repl

2016-08-17 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228 ] Jakob Odersky edited comment on SPARK-17103 at 8/17/16 9:52 AM: That's

[jira] [Commented] (SPARK-17103) Can not define class variable in repl

2016-08-17 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228 ] Jakob Odersky commented on SPARK-17103: --- That's true, the spark repl is basically just a thin

[jira] [Commented] (SPARK-17095) Latex and Scala doc do not play nicely

2016-08-16 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423602#comment-15423602 ] Jakob Odersky commented on SPARK-17095: --- Since this bug also occurs when there are no opening

Re: Error in Word Count Program

2016-07-19 Thread Jakob Odersky
Does the file /home/user/spark-1.5.1-bin-hadoop2.4/bin/README.md exist? On Tue, Jul 19, 2016 at 4:30 AM, RK Spark wrote: > val textFile = sc.textFile("README.md")val linesWithSpark = > textFile.filter(line => line.contains("Spark")) >

Re: I'm trying to understand how to compile Spark

2016-07-19 Thread Jakob Odersky
Hi Eli, to build spark, just run build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests package in your source directory, where package is the actual word "package". This will recompile the whole project, so it may take a while when running the first time. Replacing a single file

[jira] [Commented] (SPARK-15014) Spark Shell could use Ammonite Shell

2016-05-24 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299168#comment-15299168 ] Jakob Odersky commented on SPARK-15014: --- You might still have some issues with classloaders, I

[jira] [Commented] (SPARK-15014) Spark Shell could use Ammonite Shell

2016-05-24 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299118#comment-15299118 ] Jakob Odersky commented on SPARK-15014: --- spark-shell is a very thin wrapper around the standard

Re: why spark 1.6 use Netty instead of Akka?

2016-05-23 Thread Jakob Odersky
Spark actually used to depend on Akka. Unfortunately this brought in all of Akka's dependencies (in addition to Spark's already quite complex dependency graph) and, as Todd mentioned, led to conflicts with projects using both Spark and Akka. It would probably be possible to use Akka and shade it

Re: SBT doesn't pick resource file after clean

2016-05-20 Thread Jakob Odersky
implemented. > > However, even on generating the file under the default resourceDirectory => > core/src/resources doesn't pick the file in jar after doing a clean. So this > seems to be a different issue. > > > > > > On Thu, May 19, 2016 at 4:17 PM, Jakob Oders

Re: SBT doesn't pick resource file after clean

2016-05-19 Thread Jakob Odersky
To echo my comment on the PR: I think the "sbt way" to add extra, generated resources to the classpath is by adding a new task to the `resourceGenerators` setting. Also, the task should output any files into the directory specified by the `resourceManaged` setting. See

[jira] [Commented] (SPARK-13581) LibSVM throws MatchError

2016-05-18 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289738#comment-15289738 ] Jakob Odersky commented on SPARK-13581: --- I can't reproduce it anymore either > LibSVM thr

[jira] [Comment Edited] (SPARK-13581) LibSVM throws MatchError

2016-05-18 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289738#comment-15289738 ] Jakob Odersky edited comment on SPARK-13581 at 5/18/16 8:26 PM: I can't

[jira] [Commented] (SPARK-14519) Cross-publish Kafka for Scala 2.12.0-M4

2016-04-26 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259100#comment-15259100 ] Jakob Odersky commented on SPARK-14519: --- That sounds reasonable, however should the parent JIRA

[jira] [Commented] (SPARK-14146) Imported implicits can't be found in Spark REPL in some cases

2016-04-26 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259079#comment-15259079 ] Jakob Odersky commented on SPARK-14146: --- the reason this fails is because spark-shell sets

[jira] [Commented] (SPARK-14519) Cross-publish Kafka for Scala 2.12.0-M4

2016-04-26 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258974#comment-15258974 ] Jakob Odersky commented on SPARK-14519: --- >From a reply in the mailing list archive (14/4/2

[jira] [Commented] (SPARK-14417) Cleanup Scala deprecation warnings once we drop 2.10.X

2016-04-26 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258883#comment-15258883 ] Jakob Odersky commented on SPARK-14417: --- I suggested that Arun add the JIRA in the title and close

[jira] [Commented] (SPARK-14511) Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version

2016-04-26 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258702#comment-15258702 ] Jakob Odersky commented on SPARK-14511: --- release is out, pr has been submitted > Publish

[jira] [Commented] (SPARK-14511) Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version

2016-04-25 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257217#comment-15257217 ] Jakob Odersky commented on SPARK-14511: --- Update: an issue was discovered during release-testing

[jira] [Commented] (SPARK-10001) Allow Ctrl-C in spark-shell to kill running job

2016-04-20 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251304#comment-15251304 ] Jakob Odersky commented on SPARK-10001: --- FYI, I took up the issue (previous pr #8216) > Allow C

[jira] [Commented] (SPARK-14511) Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version

2016-04-18 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246336#comment-15246336 ] Jakob Odersky commented on SPARK-14511: --- cf https://github.com/typesafehub/genjavadoc/issues/73 I

[jira] [Commented] (SPARK-7992) Hide private classes/objects in in generated Java API doc

2016-04-15 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242725#comment-15242725 ] Jakob Odersky commented on SPARK-7992: -- [~mengxr], The PR is finally in! Let's hope upstream makes

Re: I want to unsubscribe

2016-04-05 Thread Jakob Odersky
to unsubscribe, send an email to user-unsubscr...@spark.apache.org On Tue, Apr 5, 2016 at 4:50 PM, Ranjana Rajendran wrote: > I get to see the threads in the public mailing list. I don;t want so many > messages in my inbox. I want to unsubscribe.

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Jakob Odersky
I just found out how the hash is calculated: gpg --print-md sha512 .tgz you can use that to check if the resulting output matches the contents of .tgz.sha On Mon, Apr 4, 2016 at 3:19 PM, Jakob Odersky <ja...@odersky.com> wrote: > The published hash is a SHA512. > > You can verif

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Jakob Odersky
Is someone going to retry fixing these packages? It's still a problem. >>>> >>>> Also, it would be good to understand why this is happening. >>>> >>>> On Fri, Mar 18, 2016 at 6:49 PM Jakob Odersky <ja...@odersky.com> wrote: >>

[jira] [Commented] (SPARK-7992) Hide private classes/objects in in generated Java API doc

2016-03-28 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215315#comment-15215315 ] Jakob Odersky commented on SPARK-7992: -- [~mengxr], I just submitted [another PR|https://github.com

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Jakob Odersky
I mean from the perspective of someone developing Spark, it makes things more complicated. It's just my point of view, people that actually support Spark deployments may have a different opinion ;) On Thu, Mar 24, 2016 at 2:41 PM, Jakob Odersky <ja...@odersky.com> wrote: > You can, but s

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Jakob Odersky
You can, but since it's going to be a maintainability issue I would argue it is in fact a problem. On Thu, Mar 24, 2016 at 2:34 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > Hi Jakob, > > On Thu, Mar 24, 2016 at 2:29 PM, Jakob Odersky <ja...@odersky.com> wrote: &

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Jakob Odersky
Reynold's 3rd point is particularly strong in my opinion. Supporting Scala 2.12 will require Java 8 anyway, and introducing such a change is probably best done in a major release. Consider what would happen if Spark 2.0 doesn't require Java 8 and hence not support Scala 2.12. Will it be stuck on

[jira] [Comment Edited] (SPARK-7992) Hide private classes/objects in in generated Java API doc

2016-03-23 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209280#comment-15209280 ] Jakob Odersky edited comment on SPARK-7992 at 3/23/16 10:16 PM: Hey

[jira] [Commented] (SPARK-7992) Hide private classes/objects in in generated Java API doc

2016-03-23 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209280#comment-15209280 ] Jakob Odersky commented on SPARK-7992: -- Hey Xiangrui, you caught me in a very busy time last week

Re: Building spark submodule source code

2016-03-21 Thread Jakob Odersky
Another gotcha to watch out for are the SPARK_* environment variables. Have you exported SPARK_HOME? In that case, 'spark-shell' will use Spark from the variable, regardless of the place the script is called from. I.e. if SPARK_HOME points to a release version of Spark, your code changes will

Re: Can't zip RDDs with unequal numbers of partitions

2016-03-20 Thread Jakob Odersky
Can you share a snippet that reproduces the error? What was spark.sql.autoBroadcastJoinThreshold before your last change? On Thu, Mar 17, 2016 at 10:03 AM, Jiří Syrový wrote: > Hi, > > any idea what could be causing this issue? It started appearing after > changing

Re: ClassNotFoundException in RDD.map

2016-03-20 Thread Jakob Odersky
The error is very strange indeed, however without code that reproduces it, we can't really provide much help beyond speculation. One thing that stood out to me immediately is that you say you have an RDD of Any where every Any should be a BigDecimal, so why not specify that type information? When

Re: The error to read HDFS custom file in spark.

2016-03-19 Thread Jakob Odersky
Doesn't FileInputFormat require type parameters? Like so: class RawDataInputFormat[LW <: LongWritable, RD <: RDRawDataRecord] extends FileInputFormat[LW, RD] I haven't verified this but it could be related to the compile error you're getting. On Thu, Mar 17, 2016 at 9:53 AM, Benyi Wang

Re: installing packages with pyspark

2016-03-19 Thread Jakob Odersky
Hi, regarding 1, packages are resolved locally. That means that when you specify a package, spark-submit will resolve the dependencies and download any jars on the local machine, before shipping* them to the cluster. So, without a priori knowledge of dataproc clusters, it should be no different to

Re: installing packages with pyspark

2016-03-19 Thread Jakob Odersky
line of spark-submit or pyspark. See >> http://spark.apache.org/docs/latest/submitting-applications.html >> >> _ >> From: Jakob Odersky <ja...@odersky.com> >> Sent: Thursday, March 17, 2016 6:40 PM >> Subject: Re: installing pa

[jira] [Commented] (SPARK-7992) Hide private classes/objects in in generated Java API doc

2016-03-19 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197877#comment-15197877 ] Jakob Odersky commented on SPARK-7992: -- I'll check it out > Hide private classes/obje

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-18 Thread Jakob Odersky
I just experienced the issue, however retrying the download a second time worked. Could it be that there is some load balancer/cache in front of the archive and some nodes still serve the corrupt packages? On Fri, Mar 18, 2016 at 8:00 AM, Nicholas Chammas wrote: > I'm

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-18 Thread Jakob Odersky
com> wrote: > I just retried the Spark 1.6.1 / Hadoop 2.6 download and got a corrupt ZIP > file. > > Jakob, are you sure the ZIP unpacks correctly for you? Is it the same Spark > 1.6.1/Hadoop 2.6 package you had a success with? > > On Fri, Mar 18, 2016 at 6:11 PM Jakob Odersk

[jira] [Created] (SPARK-13929) Use Scala reflection for UDFs

2016-03-16 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-13929: - Summary: Use Scala reflection for UDFs Key: SPARK-13929 URL: https://issues.apache.org/jira/browse/SPARK-13929 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-03-16 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196928#comment-15196928 ] Jakob Odersky commented on SPARK-13118: --- Upate: there was actually with inner classes (or package

Re: Error building spark app with Maven

2016-03-15 Thread Jakob Odersky
k >>>> spark-sql_2.10 >>>> 1.5.1 >>>> >>>> >>>> >>>> [DEBUG] endProcessChildren: artifact=spark:scala:jar:1.0 >>>> [INFO] >>>> >&

[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-03-15 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196463#comment-15196463 ] Jakob Odersky commented on SPARK-13118: --- Should I remove the JIRA ID from my existing PR

Re: Error building spark app with Maven

2016-03-15 Thread Jakob Odersky
Hi Mich, probably unrelated to the current error you're seeing, however the following dependencies will bite you later: spark-hive_2.10 spark-csv_2.11 the problem here is that you're using libraries built for different Scala binary versions (the numbers after the underscore). The simple fix here

[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-03-14 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194465#comment-15194465 ] Jakob Odersky commented on SPARK-13118: --- Sure, I'll submit a PR with the test > Supp

Re: How to distribute dependent files (.so , jar ) across spark worker nodes

2016-03-14 Thread Jakob Odersky
Have you tried setting the configuration `spark.executor.extraLibraryPath` to point to a location where your .so's are available? (Not sure if non-local files, such as HDFS, are supported) On Mon, Mar 14, 2016 at 2:12 PM, Tristan Nixon wrote: > What build system are you

[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-03-14 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194163#comment-15194163 ] Jakob Odersky commented on SPARK-13118: --- [~marmbrus], what's the issue at hand? Creating a simple

[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-03-14 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193968#comment-15193968 ] Jakob Odersky commented on SPARK-13118: --- If I recall correctly, I couldn't reproduce the issue

Re: Installing Spark on Mac

2016-03-11 Thread Jakob Odersky
regarding my previous message, I forgot to mention to run netstat as root (sudo netstat -plunt) sorry for the noise On Fri, Mar 11, 2016 at 12:29 AM, Jakob Odersky <ja...@odersky.com> wrote: > Some more diagnostics/suggestions: > > 1) are other services listening to ports in the

Re: Installing Spark on Mac

2016-03-11 Thread Jakob Odersky
ommand env|grep SPARK; nothing comes back >>>> >>>> Tried env|grep Spark; which is the directory I created for Spark once I >>>> downloaded the tgz file; comes back with PWD=/Users/aidatefera/Spark >>>> >>>> Tried running ./bin/spark-shell ; come

Re: Installing Spark on Mac

2016-03-09 Thread Jakob Odersky
Sorry had a typo in my previous message: > try running just "/bin/spark-shell" please remove the leading slash (/) On Wed, Mar 9, 2016 at 1:39 PM, Aida Tefera wrote: > Hi there, tried echo $SPARK_HOME but nothing comes back so I guess I need to > set it. How would I do

Re: Installing Spark on Mac

2016-03-09 Thread Jakob Odersky
As Tristan mentioned, it looks as though Spark is trying to bind on port 0 and then 1 (which is not allowed). Could it be that some environment variables from you previous installation attempts are polluting your configuration? What does running "env | grep SPARK" show you? Also, try running just

Re: Installing Spark on Mac

2016-03-08 Thread Jakob Odersky
I've had some issues myself with the user-provided-Hadoop version. If you simply just want to get started, I would recommend downloading Spark (pre-built, with any of the hadoop versions) as Cody suggested. A simple step-by-step guide: 1. curl

[jira] [Updated] (SPARK-13581) LibSVM throws MatchError

2016-02-29 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky updated SPARK-13581: -- Description: When running an action on a DataFrame obtained by reading from a libsvm file

[jira] [Commented] (SPARK-13581) LibSVM throws MatchError

2016-02-29 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173058#comment-15173058 ] Jakob Odersky commented on SPARK-13581: --- It's in spark "data/mllib/sample_libsvm_dat

[jira] [Created] (SPARK-13581) LibSVM throws MatchError

2016-02-29 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-13581: - Summary: LibSVM throws MatchError Key: SPARK-13581 URL: https://issues.apache.org/jira/browse/SPARK-13581 Project: Spark Issue Type: Bug

Re: [discuss] DataFrame vs Dataset in Spark 2.0

2016-02-26 Thread Jakob Odersky
I would recommend (non-binding) option 1. Apart from the API breakage I can see only advantages, and that sole disadvantage is minimal for a few reasons: 1. the DataFrame API has been "Experimental" since its implementation, so no stability was ever implied 2. considering that the change is for

[jira] [Reopened] (SPARK-7768) Make user-defined type (UDT) API public

2016-02-26 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky reopened SPARK-7768: -- > Make user-defined type (UDT) API pub

[jira] [Closed] (SPARK-7768) Make user-defined type (UDT) API public

2016-02-25 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky closed SPARK-7768. Resolution: Fixed > Make user-defined type (UDT) API pub

[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public

2016-02-25 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168155#comment-15168155 ] Jakob Odersky commented on SPARK-7768: -- [~marmbrus] UDTs are public now (in Scala at least), can

[jira] [Comment Edited] (SPARK-12878) Dataframe fails with nested User Defined Types

2016-02-25 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160119#comment-15160119 ] Jakob Odersky edited comment on SPARK-12878 at 2/25/16 10:22 PM: - I just

[jira] [Commented] (SPARK-10712) JVM crashes with spark.sql.tungsten.enabled = true

2016-02-25 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167911#comment-15167911 ] Jakob Odersky commented on SPARK-10712: --- Any news on this? Is it still an issue? > JVM cras

Re: How could I do this algorithm in Spark?

2016-02-24 Thread Jakob Odersky
Hi Guillermo, assuming that the first "a,b" is a typo and you actually meant "a,d", this is a sorting problem. You could easily model your data as an RDD or tuples (or as a dataframe/set) and use the sortBy (or orderBy for dataframe/sets) methods. best, --Jakob On Wed, Feb 24, 2016 at 2:26 PM,

<    1   2   3   4   >