Unable to acquire memory errors in HiveCompatibilitySuite

2015-09-14 Thread Pete Robbins
I keep hitting errors running the tests on 1.5 such as - join31 *** FAILED *** Failed to execute query using catalyst: Error: Job aborted due to stage failure: Task 9 in stage 3653.0 failed 1 times, most recent failure: Lost task 9.0 in stage 3653.0 (TID 123363, localhost):

Re: Spark Streaming..Exception

2015-09-14 Thread Akhil Das
You should consider upgrading your spark from 1.3.0 to a higher version. Thanks Best Regards On Mon, Sep 14, 2015 at 2:28 PM, Priya Ch wrote: > Hi All, > > I came across the related old conversation on the above issue ( >

JavaRDD using Reflection

2015-09-14 Thread Rachana Srivastava
Hello all, I am working a problem that requires us to create different set of JavaRDD based on different input arguments. We are getting following error when we try to use a factory to create JavaRDD. Error message is clear but I am wondering is there any workaround. Question: How to create

Re: [MLlib] Extensibility of MLlib classes (Word2VecModel etc.)

2015-09-14 Thread Joseph Bradley
We tend to resist opening up APIs unless there's a strong reason to and we feel reasonably confident that the API will remain stable. That allows us to make fixes if we realize there are issues with those APIs. But if you have an important use case, I'd recommend opening up a JIRA to discuss it.

Null Value in DecimalType column of DataFrame

2015-09-14 Thread Dirceu Semighini Filho
Hi all, I'm moving from spark 1.4 to 1.5, and one of my tests is failing. It seems that there was some changes in org.apache.spark.sql.types. DecimalType This ugly code is a little sample to reproduce the error, don't use it into your project. test("spark test") { val file =

JDBC Dialect tests

2015-09-14 Thread Luciano Resende
I was looking for the code mentioned in SPARK-9818 and SPARK-6136 that supposedly is testing MySQL and PostgreSQL using Docker and it seems that this code has been removed. Could anyone provide me a pointer on where are these tests actually located at the moment, and how they are integrated with

Re: JDBC Dialect tests

2015-09-14 Thread Reynold Xin
SPARK-9818 you link to actually links to a pull request trying to bring them back. On Mon, Sep 14, 2015 at 1:34 PM, Luciano Resende wrote: > I was looking for the code mentioned in SPARK-9818 and SPARK-6136 that > supposedly is testing MySQL and PostgreSQL using Docker

RDD API patterns

2015-09-14 Thread sim
I'd like to get some feedback on an API design issue pertaining to RDDs. The design goal to avoid RDD nesting, which I agree with, leads the methods operating on subsets of an RDD (not necessarily partitions) to use Iterable as an abstraction. The mapPartitions and groupBy* family of methods are

Re: JavaRDD using Reflection

2015-09-14 Thread Ankur Srivastava
It is not reflection that is the issue here but use of an RDD transformation "featureKeyClassPair.map" inside "lines.mapToPair". >From the code snippet you have sent it is not very clear if getFeatureScore(id,data) invokes executeFeedFeatures, but if that is the case it is not very obvious that

Re: Unable to acquire memory errors in HiveCompatibilitySuite

2015-09-14 Thread Reynold Xin
Pete - can you do me a favor? https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala#L174 Print the parameters that are passed into the getPageSize function, and check their values. On Mon, Sep 14, 2015 at 4:32 PM, Reynold Xin

And.eval short circuiting

2015-09-14 Thread Zack Sampson
It seems like And.eval can avoid calculating right.eval if left.eval returns null. Is there a reason it's written like it is? override def eval(input: Row): Any = { val l = left.eval(input) if (l == false) { false } else { val r = right.eval(input) if (r == false) {

RE: Enum parameter in ML

2015-09-14 Thread Ulanov, Alexander
Hi Feynman, Thank you for suggestion. How can I ensure that there will be no problems for Java users? (I only use Scala API) Best regards, Alexander From: Feynman Liang [mailto:fli...@databricks.com] Sent: Monday, September 14, 2015 5:27 PM To: Ulanov, Alexander Cc: dev@spark.apache.org

Re: Enum parameter in ML

2015-09-14 Thread Feynman Liang
Since PipelineStages are serializable, the params must also be serializable. We also have to keep the Java API in mind. Introducing a new enum Param type may work, but we will have to ensure that Java users can use it without dealing with ClassTags (I believe Scala will create new types for each

Re: Enum parameter in ML

2015-09-14 Thread Feynman Liang
We usually write a Java test suite which exercises the public API (e.g. DCT ). It may be possible to create a sealed trait with singleton concrete instances inside of a serializable

Data frame with one column

2015-09-14 Thread Ulanov, Alexander
Dear Spark developers, I would like to create a dataframe with one column. However, the createDataFrame method accepts at least a Product: val data = Seq(1.0, 2.0) val rdd = sc.parallelize(data, 2) val df = sqlContext.createDataFrame(rdd) [fail]:25: error: overloaded method value

Re: Data frame with one column

2015-09-14 Thread Feynman Liang
For an example, see the ml-feature word2vec user guide On Mon, Sep 14, 2015 at 11:03 AM, Feynman Liang wrote: > You could use `Tuple1(x)` instead of `Hack` > > On Mon, Sep 14, 2015 at 10:50 AM, Ulanov,

Re: Data frame with one column

2015-09-14 Thread Feynman Liang
You could use `Tuple1(x)` instead of `Hack` On Mon, Sep 14, 2015 at 10:50 AM, Ulanov, Alexander < alexander.ula...@hpe.com> wrote: > Dear Spark developers, > > > > I would like to create a dataframe with one column. However, the > createDataFrame method accepts at least a Product: > > > > val

RE: Data frame with one column

2015-09-14 Thread Ulanov, Alexander
Thank you for quick response! I’ll use Tuple1 From: Feynman Liang [mailto:fli...@databricks.com] Sent: Monday, September 14, 2015 11:05 AM To: Ulanov, Alexander Cc: dev@spark.apache.org Subject: Re: Data frame with one column For an example, see the ml-feature word2vec user

Spark 1.5.1 release

2015-09-14 Thread Reynold Xin
Hi devs, FYI - we have already accumulated an "interesting" list of issues found with the 1.5.0 release. I will work on an RC in the next week or two, depending on how many blocker/critical issues are fixed. https://issues.apache.org/jira/issues/?filter=1221

ML: embed a transformer

2015-09-14 Thread Saif.A.Ellafi
Hi all, I'm very new to spark and looking forward to get deep into the topic. Right now I am trying to inherit my own transformer, by what I am reading so far, it is not very public that we can apply to this practice as "users". I am defining my transformer based on the Binarizer, but simply

Fwd: JobScheduler: Error generating jobs for time for custom InputDStream

2015-09-14 Thread Juan Rodríguez Hortalá
Hi, I sent this message to the user list a few weeks ago with no luck, so I'm forwarding it to the dev list in case someone could give a hand with this. Thanks a lot in advance I've developed a ScalaCheck property for testing Spark Streaming transformations. To do that I had to develop a custom

Re: ML: embed a transformer

2015-09-14 Thread Feynman Liang
Where did you read that it should be public? The traits in ml.param.shared are meant to be used across internal spark.ml transformer implementations. If your transformer could be included in spark.ml, then I would recommend implementing it there so these package private traits can be reused.

RE: ML: embed a transformer

2015-09-14 Thread Saif.A.Ellafi
Thank you, I will do as you suggested. Ps: I read that in this random user archive I found: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3c55709f7b.2090...@gmail.com%3E Saif From: Feynman Liang [mailto:fli...@databricks.com] Sent: Monday, September 14, 2015 4:08 PM To:

Re: JavaRDD using Reflection

2015-09-14 Thread Ajay Singal
Hello Rachana, The easiest way would be to start with creating a 'parent' JavaRDD and run different filters (based on different input arguments) to create respective 'child' JavaRDDs dynamically. Notice that the creation of these children RDDs is handled by the application driver. Hope this

An alternate UI for Spark.

2015-09-14 Thread Prashant Sharma
Hi all, TLDR; Some of my colleagues at Imaginea are interested in building an alternate UI for Spark. Basically allow people or groups to build an alternate UI for Spark. More Details: Looking at feasibility, it feels definitely possible to do. But we need a consensus on a public(can be

Re: An alternate UI for Spark.

2015-09-14 Thread Ryan Williams
You can check out Spree for one data point about how this can be done; it is a near-clone of the Spark web UI that updates in real-time. It uses JsonRelay , a SparkListener that sends events as JSON over the

Re: Spark Streaming..Exception

2015-09-14 Thread Priya Ch
Hi All, I came across the related old conversation on the above issue ( https://issues.apache.org/jira/browse/SPARK-5594. ) Is the issue fixed? I tried different values for spark.cleaner.ttl -> 0sec, -1sec, 2000sec,..none of them worked. I also tried setting spark.streaming.unpersist -> true.