date:20151002

[jira] [Commented] (SPARK-10669) Link to each language's API in codetabs in ML docs: spark.mllib

2015-10-02 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940844#comment-14940844
 ] 

Xin Ren commented on SPARK-10669:
-

Hi Joseph,

I'm just double checking with you that I'm doing it right. I just made a test 
change below by adding a external link, could you please confirm I'm doing the 
right thing?
https://github.com/keypointt/spark/commit/f8289891d5b32fffdc6a4ce077d8d206e015119f

Also I'm not quite sure what you mean by "codetabs"? I see some terms are 
linking to Wikipedia, and some to Spark internal files. Could you please give 
me a quick example of this?

As I understand, for example, "ChiSqSelector" section in 
https://github.com/apache/spark/blob/64743870f23bffb8d96dcc8a0181c1452782a151/docs/mllib-feature-extraction.md,
 add Wikipedia link to "Feature selection".

As I understand, all below .md file should be modified to be linking to API in 
a consistent way?
* mllib-classification-regression.md
* mllib-clustering.md   
* mllib-collaborative-filtering.md  
* mllib-data-types.md   
* mllib-decision-tree.md
* mllib-dimensionality-reduction.md 
* mllib-ensembles.md
* mllib-evaluation-metrics.md   
* mllib-feature-extraction.md   
* mllib-frequent-pattern-mining.md  
* mllib-guide.md
* mllib-isotonic-regression.md  
* mllib-linear-methods.md
* mllib-migration-guides.md
* mllib-naive-bayes.md
* mllib-optimization.md
* mllib-pmml-model-export.md
* mllib-statistics.md

Thank you very much! :)

> Link to each language's API in codetabs in ML docs: spark.mllib
> ---
>
> Key: SPARK-10669
> URL: https://issues.apache.org/jira/browse/SPARK-10669
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, MLlib
>Reporter: Joseph K. Bradley
>
> In the Markdown docs for the spark.mllib Programming Guide, we have code 
> examples with codetabs for each language.  We should link to each language's 
> API docs within the corresponding codetab, but we are inconsistent about 
> this.  For an example of what we want to do, see the "ChiSqSelector" section 
> in 
> [https://github.com/apache/spark/blob/64743870f23bffb8d96dcc8a0181c1452782a151/docs/mllib-feature-extraction.md]
> This JIRA is just for spark.mllib, not spark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10880) Hive module build test failed

2015-10-02 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940862#comment-14940862
 ] 

Jean-Baptiste Onofré commented on SPARK-10880:
--

Just updated my local copy, I'm testing.

> Hive module build test failed
> -
>
> Key: SPARK-10880
> URL: https://issues.apache.org/jira/browse/SPARK-10880
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jean-Baptiste Onofré
>
> On master, sql/hive module tests fail.
> The reason is that the bin/spark-submit is not found. The impacted test are:
> - SPARK-8468
> - SPARK-8020
> - SPARK-8489
> - SPARK-9757
> I gonna take a look to fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10880) Hive module build test failed

2015-10-02 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940877#comment-14940877
 ] 

Jean-Baptiste Onofré commented on SPARK-10880:
--

So far so good, sorry for the noise.

> Hive module build test failed
> -
>
> Key: SPARK-10880
> URL: https://issues.apache.org/jira/browse/SPARK-10880
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jean-Baptiste Onofré
>
> On master, sql/hive module tests fail.
> The reason is that the bin/spark-submit is not found. The impacted test are:
> - SPARK-8468
> - SPARK-8020
> - SPARK-8489
> - SPARK-9757
> I gonna take a look to fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10905) Export freqItems() for DataFrameStatFunctions in SparkR

2015-10-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10905:


Assignee: Apache Spark

> Export freqItems() for DataFrameStatFunctions in SparkR
> ---
>
> Key: SPARK-10905
> URL: https://issues.apache.org/jira/browse/SPARK-10905
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: rerngvit yanggratoke
>Assignee: Apache Spark
> Fix For: 1.6.0
>
>
> Currently only crosstab is implemented. This subtask is about adding 
> freqItems() API to sparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10905) Export freqItems() for DataFrameStatFunctions in SparkR

2015-10-02 Thread rerngvit yanggratoke (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940890#comment-14940890
 ] 

rerngvit yanggratoke commented on SPARK-10905:
--

[~shivaram] I have a PR for this issue. Please have a look.

> Export freqItems() for DataFrameStatFunctions in SparkR
> ---
>
> Key: SPARK-10905
> URL: https://issues.apache.org/jira/browse/SPARK-10905
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: rerngvit yanggratoke
> Fix For: 1.6.0
>
>
> Currently only crosstab is implemented. This subtask is about adding 
> freqItems() API to sparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10905) Export freqItems() for DataFrameStatFunctions in SparkR

2015-10-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10905:


Assignee: (was: Apache Spark)

> Export freqItems() for DataFrameStatFunctions in SparkR
> ---
>
> Key: SPARK-10905
> URL: https://issues.apache.org/jira/browse/SPARK-10905
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: rerngvit yanggratoke
> Fix For: 1.6.0
>
>
> Currently only crosstab is implemented. This subtask is about adding 
> freqItems() API to sparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10905) Export freqItems() for DataFrameStatFunctions in SparkR

2015-10-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940891#comment-14940891
 ] 

Apache Spark commented on SPARK-10905:
--

User 'rerngvit' has created a pull request for this issue:
https://github.com/apache/spark/pull/8962

> Export freqItems() for DataFrameStatFunctions in SparkR
> ---
>
> Key: SPARK-10905
> URL: https://issues.apache.org/jira/browse/SPARK-10905
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: rerngvit yanggratoke
> Fix For: 1.6.0
>
>
> Currently only crosstab is implemented. This subtask is about adding 
> freqItems() API to sparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10909) Spark sql jdbc fails for Oracle NUMBER type columns

2015-10-02 Thread Kostas papageorgopoulos (JIRA)

Kostas papageorgopoulos created SPARK-10909:
---

 Summary: Spark sql jdbc fails for Oracle NUMBER type columns
 Key: SPARK-10909
 URL: https://issues.apache.org/jira/browse/SPARK-10909
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
 Environment: Dev
Reporter: Kostas papageorgopoulos
 Fix For: 1.5.1


When using spark sql to connect to Oracle and run a spark sql query i get the 
following exception "requirement failed: Overflowed precision" This is 
triggered when in the dbTable definition it is included an Oracle NUMBER column

{code}
   SQLContext sqlContext = new SQLContext(sc);
Map options = new HashMap<>();
options.put("driver", "oracle.jdbc.OracleDriver");
options.put("user", "USER");
options.put("password", "PASS");
options.put("url", "ORACLE CONNECTINO URL");
options.put("dbtable", "(select VARCHAR_COLUMN 
,TIMESTAMP_COLUMN,NUMBER_COLUMN from lsc_subscription_profiles)");

DataFrame jdbcDF = 
sqlContext.read().format("jdbc").options(options).load();

jdbcDF.toJavaRDD().saveAsTextFile("hdfs://hdfshost:8020" + 
"/path/to/write.bz2", BZip2Codec.class);
{code}

using driver 

{code}
 com.oracle
ojdbc6
11.2.0.3.0


{code}

{code}

g.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, 10.130.35.52): java.lang.IllegalArgumentException: requirement failed: 
Overflowed precision
at scala.Predef$.require(Predef.scala:233)
at org.apache.spark.sql.types.Decimal.set(Decimal.scala:111)
at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:335)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.getNext(JDBCRDD.scala:406)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.hasNext(JDBCRDD.scala:472)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1108)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1280)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1493)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1455)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1444)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(Spark

[jira] [Updated] (SPARK-10896) Parquet join issue

2015-10-02 Thread Tamas Szuromi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Szuromi updated SPARK-10896:
--
Affects Version/s: 1.5.1

> Parquet join issue
> --
>
> Key: SPARK-10896
> URL: https://issues.apache.org/jira/browse/SPARK-10896
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.5.1
> Environment: spark-1.5.0-bin-hadoop2.6.tgz with HDP 2.3
>Reporter: Tamas Szuromi
>  Labels: dataframe, hdfs, join, parquet, sql
>
> After loading parquet files join is not working.
> How to reproduce:
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val arr1 = Array[Row](Row.apply(0, 0), Row.apply(1,1), Row.apply(2,2), 
> Row.apply(3, 3), Row.apply(4, 4), Row.apply(5, 5), Row.apply(6, 6), 
> Row.apply(7, 7))
> val schema1 = StructType(
>   StructField("id", IntegerType) ::
>   StructField("value1", IntegerType) :: Nil)
> val df1 = sqlContext.createDataFrame(sc.parallelize(arr1), schema1)
> val arr2 = Array[Row](Row.apply(0, 0), Row.apply(1,1), Row.apply(2,2), 
> Row.apply(3, 3), Row.apply(4, 4), Row.apply(5, 5), Row.apply(6, 6), 
> Row.apply(7, 7))
> val schema2 = StructType(
>   StructField("otherId", IntegerType) ::
>   StructField("value2", IntegerType) :: Nil)
> val df2 = sqlContext.createDataFrame(sc.parallelize(arr2), schema2)
> val res = df1.join(df2, df1("id")===df2("otherId"))
> df1.take(10)
> df2.take(10)
> res.count()
> res.take(10)
> df1.write.format("parquet").save("hdfs:///tmp/df1")
> df2.write.format("parquet").save("hdfs:///tmp/df2")
> val df1=sqlContext.read.parquet("hdfs:///tmp/df1/*.parquet")
> val df2=sqlContext.read.parquet("hdfs:///tmp/df2/*.parquet")
> val res = df1.join(df2, df1("id")===df2("otherId"))
> df1.take(10)
> df2.take(10)
> res.count()
> res.take(10)
> {code}
> Output
> {code:java}
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Long = 8 
> Array[org.apache.spark.sql.Row] = Array([0,0,0,0], [1,1,1,1], [2,2,2,2], 
> [3,3,3,3], [4,4,4,4], [5,5,5,5], [6,6,6,6], [7,7,7,7]) 
> {code}
> After reading back:
> {code:java}
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Long = 4 
> Array[org.apache.spark.sql.Row] = Array([0,0,0,5], [2,2,2,null], [4,4,4,5], 
> [6,6,6,null])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10909) Spark sql jdbc fails for Oracle NUMBER type columns

2015-10-02 Thread Kostas papageorgopoulos (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostas papageorgopoulos updated SPARK-10909:

Description: 
When using spark sql to connect to Oracle and run a spark sql query i get the 
following exception "requirement failed: Overflowed precision" This is 
triggered when in the dbTable definition it is included an Oracle NUMBER column

{code}
   SQLContext sqlContext = new SQLContext(sc);
Map options = new HashMap<>();
options.put("driver", "oracle.jdbc.OracleDriver");
options.put("user", "USER");
options.put("password", "PASS");
options.put("url", "ORACLE CONNECTINO URL");
options.put("dbtable", "(select VARCHAR_COLUMN 
,TIMESTAMP_COLUMN,NUMBER_COLUMN from lsc_subscription_profiles)");

DataFrame jdbcDF = 
sqlContext.read().format("jdbc").options(options).load();

jdbcDF.toJavaRDD().saveAsTextFile("hdfs://hdfshost:8020" + 
"/path/to/write.bz2", BZip2Codec.class);
{code}

using driver 

{code}
 com.oracle
ojdbc6
11.2.0.3.0


{code}

Using java sun java jdk 1.8.51 along with spring4

The classpath of the junit run is 

{code}
/home/kostas/dev2/tools/jdk1.8.0_51/bin/java 
-agentlib:jdwp=transport=dt_socket,address=127.0.0.1:42901,suspend=y,server=n 
-ea -Duser.timezone=Africa/Cairo -Dfile.encoding=UTF-8 -classpath 
/home/kostas/dev2/tools/idea-IU-141.178.9/lib/idea_rt.jar:/home/kostas/dev2/tools/idea-IU-141.178.9/plugins/junit/lib/junit-rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfxswt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/deploy.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/charsets.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/javaws.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jce.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/resources.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/plugin.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfr.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jsse.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/management-agent.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunjce_provider.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunec.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/localedata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/jfxrt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/cldrdata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/nashorn.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/zipfs.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunpkcs11.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/dnsns.jar:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/test-classes:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/classes:/home/kostas/.m2/repository/org/apache/spark/spark-core_2.10/1.5.0/spark-core_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar:/home/kostas/.m2/repository/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar:/home/kostas/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/home/kostas/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar:/home/kostas/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/home/kostas/.m2/repository/org/objenesis/objenesis/1.2/objenesis-1.2.jar:/home/kostas/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-launcher_2.10/1.5.0/spark-launcher_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-common_2.10/1.5.0/spark-network-common_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-shuffle_2.10/1.5.0/spark-network-shuffle_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-unsafe_2.10/1.5.0/spark-unsafe_2.10-1.5.0.jar:/home/kostas/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/home/kostas/.m2/repository/org/apache/curator/curator-recipes/2.4.0/curator-recipes-2.4.0.jar:/home/kostas/.m2/repository/org/apache/curator/curator-framework/2.4.0/curator-framework-2.4.0.jar:/home/kostas/.m2/repository/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.jar:/home/kostas/.m2/repository/org/apache/commons/commons-lang3/3.3.2/commons-lang3-3.3.2.jar:/home/kostas/.m2/repository/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar:/home/kostas/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/home/kostas/.m2/repository/org/slf4j/slf4j-api/1.7.10/slf4j-api-1.7.10.jar:/home/kostas/.m2/repository/com/ning/compress-lzf/1.0.3/compress-lzf-1.0.3.jar:/home/kostas/

[jira] [Updated] (SPARK-10909) Spark sql jdbc fails for Oracle NUMBER type columns

2015-10-02 Thread Kostas papageorgopoulos (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostas papageorgopoulos updated SPARK-10909:

Description: 
When using spark sql to connect to Oracle and run a spark sql query i get the 
following exception "requirement failed: Overflowed precision" This is 
triggered when in the dbTable definition it is included an Oracle NUMBER column

{code}
   SQLContext sqlContext = new SQLContext(sc);
Map options = new HashMap<>();
options.put("driver", "oracle.jdbc.OracleDriver");
options.put("user", "USER");
options.put("password", "PASS");
options.put("url", "ORACLE CONNECTINO URL");
options.put("dbtable", "(select VARCHAR_COLUMN 
,TIMESTAMP_COLUMN,NUMBER_COLUMN from lsc_subscription_profiles)");

DataFrame jdbcDF = 
sqlContext.read().format("jdbc").options(options).load();

jdbcDF.toJavaRDD().saveAsTextFile("hdfs://hdfshost:8020" + 
"/path/to/write.bz2", BZip2Codec.class);
{code}

using driver 

{code}
 com.oracle
ojdbc6
11.2.0.3.0


{code}

Using java sun java jdk 1.8.51 along with spring4

The classpath of the junit run is 

{code}
/home/kostas/dev2/tools/jdk1.8.0_51/bin/java 
-agentlib:jdwp=transport=dt_socket,address=127.0.0.1:42901,suspend=y,server=n 
-ea -Duser.timezone=Africa/Cairo -Dfile.encoding=UTF-8 -classpath 
/home/kostas/dev2/tools/idea-IU-141.178.9/lib/idea_rt.jar:/home/kostas/dev2/tools/idea-IU-141.178.9/plugins/junit/lib/junit-rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfxswt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/deploy.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/charsets.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/javaws.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jce.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/resources.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/plugin.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfr.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jsse.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/management-agent.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunjce_provider.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunec.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/localedata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/jfxrt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/cldrdata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/nashorn.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/zipfs.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunpkcs11.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/dnsns.jar:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/test-classes:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/classes:/home/kostas/.m2/repository/org/apache/spark/spark-core_2.10/1.5.0/spark-core_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar:/home/kostas/.m2/repository/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar:/home/kostas/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/home/kostas/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar:/home/kostas/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/home/kostas/.m2/repository/org/objenesis/objenesis/1.2/objenesis-1.2.jar:/home/kostas/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-launcher_2.10/1.5.0/spark-launcher_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-common_2.10/1.5.0/spark-network-common_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-shuffle_2.10/1.5.0/spark-network-shuffle_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-unsafe_2.10/1.5.0/spark-unsafe_2.10-1.5.0.jar:/home/kostas/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/home/kostas/.m2/repository/org/apache/curator/curator-recipes/2.4.0/curator-recipes-2.4.0.jar:/home/kostas/.m2/repository/org/apache/curator/curator-framework/2.4.0/curator-framework-2.4.0.jar:/home/kostas/.m2/repository/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.jar:/home/kostas/.m2/repository/org/apache/commons/commons-lang3/3.3.2/commons-lang3-3.3.2.jar:/home/kostas/.m2/repository/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar:/home/kostas/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/home/kostas/.m2/repository/org/slf4j/slf4j-api/1.7.10/slf4j-api-1.7.10.jar:/home/kostas/.m2/repository/com/ning/compress-lzf/1.0.3/compress-lzf-1.0.3.jar:/home/kostas/

[jira] [Updated] (SPARK-10896) Parquet join issue

2015-10-02 Thread Tamas Szuromi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Szuromi updated SPARK-10896:
--
Fix Version/s: 1.6.0
   1.5.1

> Parquet join issue
> --
>
> Key: SPARK-10896
> URL: https://issues.apache.org/jira/browse/SPARK-10896
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.5.1
> Environment: spark-1.5.0-bin-hadoop2.6.tgz with HDP 2.3
>Reporter: Tamas Szuromi
>  Labels: dataframe, hdfs, join, parquet, sql
> Fix For: 1.5.1, 1.6.0
>
>
> After loading parquet files join is not working.
> How to reproduce:
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val arr1 = Array[Row](Row.apply(0, 0), Row.apply(1,1), Row.apply(2,2), 
> Row.apply(3, 3), Row.apply(4, 4), Row.apply(5, 5), Row.apply(6, 6), 
> Row.apply(7, 7))
> val schema1 = StructType(
>   StructField("id", IntegerType) ::
>   StructField("value1", IntegerType) :: Nil)
> val df1 = sqlContext.createDataFrame(sc.parallelize(arr1), schema1)
> val arr2 = Array[Row](Row.apply(0, 0), Row.apply(1,1), Row.apply(2,2), 
> Row.apply(3, 3), Row.apply(4, 4), Row.apply(5, 5), Row.apply(6, 6), 
> Row.apply(7, 7))
> val schema2 = StructType(
>   StructField("otherId", IntegerType) ::
>   StructField("value2", IntegerType) :: Nil)
> val df2 = sqlContext.createDataFrame(sc.parallelize(arr2), schema2)
> val res = df1.join(df2, df1("id")===df2("otherId"))
> df1.take(10)
> df2.take(10)
> res.count()
> res.take(10)
> df1.write.format("parquet").save("hdfs:///tmp/df1")
> df2.write.format("parquet").save("hdfs:///tmp/df2")
> val df1=sqlContext.read.parquet("hdfs:///tmp/df1/*.parquet")
> val df2=sqlContext.read.parquet("hdfs:///tmp/df2/*.parquet")
> val res = df1.join(df2, df1("id")===df2("otherId"))
> df1.take(10)
> df2.take(10)
> res.count()
> res.take(10)
> {code}
> Output
> {code:java}
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Long = 8 
> Array[org.apache.spark.sql.Row] = Array([0,0,0,0], [1,1,1,1], [2,2,2,2], 
> [3,3,3,3], [4,4,4,4], [5,5,5,5], [6,6,6,6], [7,7,7,7]) 
> {code}
> After reading back:
> {code:java}
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Long = 4 
> Array[org.apache.spark.sql.Row] = Array([0,0,0,5], [2,2,2,null], [4,4,4,5], 
> [6,6,6,null])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10896) Parquet join issue

2015-10-02 Thread Tamas Szuromi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Szuromi updated SPARK-10896:
--
Fix Version/s: (was: 1.5.1)
   (was: 1.6.0)

> Parquet join issue
> --
>
> Key: SPARK-10896
> URL: https://issues.apache.org/jira/browse/SPARK-10896
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.5.1
> Environment: spark-1.5.0-bin-hadoop2.6.tgz with HDP 2.3
>Reporter: Tamas Szuromi
>  Labels: dataframe, hdfs, join, parquet, sql
>
> After loading parquet files join is not working.
> How to reproduce:
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val arr1 = Array[Row](Row.apply(0, 0), Row.apply(1,1), Row.apply(2,2), 
> Row.apply(3, 3), Row.apply(4, 4), Row.apply(5, 5), Row.apply(6, 6), 
> Row.apply(7, 7))
> val schema1 = StructType(
>   StructField("id", IntegerType) ::
>   StructField("value1", IntegerType) :: Nil)
> val df1 = sqlContext.createDataFrame(sc.parallelize(arr1), schema1)
> val arr2 = Array[Row](Row.apply(0, 0), Row.apply(1,1), Row.apply(2,2), 
> Row.apply(3, 3), Row.apply(4, 4), Row.apply(5, 5), Row.apply(6, 6), 
> Row.apply(7, 7))
> val schema2 = StructType(
>   StructField("otherId", IntegerType) ::
>   StructField("value2", IntegerType) :: Nil)
> val df2 = sqlContext.createDataFrame(sc.parallelize(arr2), schema2)
> val res = df1.join(df2, df1("id")===df2("otherId"))
> df1.take(10)
> df2.take(10)
> res.count()
> res.take(10)
> df1.write.format("parquet").save("hdfs:///tmp/df1")
> df2.write.format("parquet").save("hdfs:///tmp/df2")
> val df1=sqlContext.read.parquet("hdfs:///tmp/df1/*.parquet")
> val df2=sqlContext.read.parquet("hdfs:///tmp/df2/*.parquet")
> val res = df1.join(df2, df1("id")===df2("otherId"))
> df1.take(10)
> df2.take(10)
> res.count()
> res.take(10)
> {code}
> Output
> {code:java}
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Long = 8 
> Array[org.apache.spark.sql.Row] = Array([0,0,0,0], [1,1,1,1], [2,2,2,2], 
> [3,3,3,3], [4,4,4,4], [5,5,5,5], [6,6,6,6], [7,7,7,7]) 
> {code}
> After reading back:
> {code:java}
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Long = 4 
> Array[org.apache.spark.sql.Row] = Array([0,0,0,5], [2,2,2,null], [4,4,4,5], 
> [6,6,6,null])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10873) can't sort columns on history page

2015-10-02 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941109#comment-14941109
 ] 

Steve Loughran commented on SPARK-10873:


Note that the problem appears to arise in the specific case of >1 attempt per 
job. if you only have one attempt then the attempts column doesn't appear and 
this problem isnt' visible. (at least in my incomplete experiments).

I suspect that always listing an attempt would be sufficient to restore correct 
behaviour

> can't sort columns on history page
> --
>
> Key: SPARK-10873
> URL: https://issues.apache.org/jira/browse/SPARK-10873
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>
> Starting with 1.5.1 the history server page isn't allowing sorting by column



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5925) YARN - Spark progress bar stucks at 10% but after finishing shows 100%

2015-10-02 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941112#comment-14941112
 ] 

Steve Loughran commented on SPARK-5925:
---

I've actually thought about how to do this as part of SPARK-1537, but was 
waiting to get the core patch done first.

What needs to be done is *relatively* straightforward:

for every incomplete app loaded into the history server, something needs to 
GETs updated events from the history server and forwards them to the app UI.

That's the core concept; implementation details need to consider
* completion of app
* removal of app from cache & need to ensure no leaks by retaining app ui links 
elsewhere
* the fact that ATS doesn't do blocking reads, so polling on a schedule is 
required
* the need to keep load on the history server down so as to avoid overloading 
it on a large cluster.

> YARN - Spark progress bar stucks at 10% but after finishing shows 100%
> --
>
> Key: SPARK-5925
> URL: https://issues.apache.org/jira/browse/SPARK-5925
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.2.1
>Reporter: Laszlo Fesus
>Priority: Minor
>
> I did set up a yarn cluster (CDH5) and spark (1.2.1), and also started Spark 
> History Server. Now I am able to click on more details on yarn's web 
> interface and get redirected to the appropriate spark logs during both job 
> execution and also after the job has finished. 
> My only concern is that while a spark job is being executed (either 
> yarn-client or yarn-cluster), the progress bar stucks at 10% and doesn't 
> increase as for MapReduce jobs. After finishing, it shows 100% properly, but 
> we are loosing the real-time tracking capability of the status bar. 
> Also tested yarn restful web interface, and it retrieves again 10% during 
> (yarn) spark job execution, and works well again after finishing. (I suppose 
> for the while being I should have a look on Spark Job Server and see if it's 
> possible to track the job via its restful web interface.)
> Did anyone else experience this behaviour? Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6882) Spark ThriftServer2 Kerberos failed encountering java.lang.IllegalArgumentException: Unknown auth type: null Allowed values are: [auth-int, auth-conf, auth]

2015-10-02 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941116#comment-14941116
 ] 

Steve Loughran commented on SPARK-6882:
---

This will have gone away with Spark 1.5 and the upgrade of hive done in 
SPARK-8064; closing as fixed. 

> Spark ThriftServer2 Kerberos failed encountering 
> java.lang.IllegalArgumentException: Unknown auth type: null Allowed values 
> are: [auth-int, auth-conf, auth]
> 
>
> Key: SPARK-6882
> URL: https://issues.apache.org/jira/browse/SPARK-6882
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 1.3.0, 1.4.0
> Environment: * Apache Hadoop 2.4.1 with Kerberos Enabled
> * Apache Hive 0.13.1
> * Spark 1.2.1 git commit b6eaf77d4332bfb0a698849b1f5f917d20d70e97
> * Spark 1.3.0 rc1 commit label 0dcb5d9f31b713ed90bcec63ebc4e530cbb69851
>Reporter: Andrew Lee
>
> When Kerberos is enabled, I get the following exceptions. 
> {code}
> 2015-03-13 18:26:05,363 ERROR 
> org.apache.hive.service.cli.thrift.ThriftCLIService 
> (ThriftBinaryCLIService.java:run(93)) - Error: 
> java.lang.IllegalArgumentException: Unknown auth type: null Allowed values 
> are: [auth-int, auth-conf, auth]
> {code}
> I tried it in
> * Spark 1.2.1 git commit b6eaf77d4332bfb0a698849b1f5f917d20d70e97
> * Spark 1.3.0 rc1 commit label 0dcb5d9f31b713ed90bcec63ebc4e530cbb69851
> with
> * Apache Hive 0.13.1
> * Apache Hadoop 2.4.1
> Build command
> {code}
> mvn -U -X -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Phive-thriftserver 
> -Dhadoop.version=2.4.1 -Dyarn.version=2.4.1 -Dhive.version=0.13.1 -DskipTests 
> install
> {code}
> When starting Spark ThriftServer in {{yarn-client}} mode, the command to 
> start thriftserver looks like this
> {code}
> ./start-thriftserver.sh --hiveconf hive.server2.thrift.port=2 --hiveconf 
> hive.server2.thrift.bind.host=$(hostname) --master yarn-client
> {code}
> {{hostname}} points to the current hostname of the machine I'm using.
> Error message in {{spark.log}} from Spark 1.2.1 (1.2 rc1)
> {code}
> 2015-03-13 18:26:05,363 ERROR 
> org.apache.hive.service.cli.thrift.ThriftCLIService 
> (ThriftBinaryCLIService.java:run(93)) - Error: 
> java.lang.IllegalArgumentException: Unknown auth type: null Allowed values 
> are: [auth-int, auth-conf, auth]
> at org.apache.hive.service.auth.SaslQOP.fromString(SaslQOP.java:56)
> at 
> org.apache.hive.service.auth.HiveAuthFactory.getSaslProperties(HiveAuthFactory.java:118)
> at 
> org.apache.hive.service.auth.HiveAuthFactory.getAuthTransFactory(HiveAuthFactory.java:133)
> at 
> org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:43)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> I'm wondering if this is due to the same problem described in HIVE-8154 
> HIVE-7620 due to an older code based for the Spark ThriftServer?
> Any insights are appreciated. Currently, I can't get Spark ThriftServer2 to 
> run against a Kerberos cluster (Apache 2.4.1).
> My hive-site.xml looks like the following for spark/conf.
> The kerberos keytab and tgt are configured correctly, I'm able to connect to 
> metastore, but the subsequent steps failed due to the exception.
> {code}
> 
>   hive.semantic.analyzer.factory.impl
>   org.apache.hcatalog.cli.HCatSemanticAnalyzerFactory
> 
> 
>   hive.metastore.execute.setugi
>   true
> 
> 
>   hive.stats.autogather
>   false
> 
> 
>   hive.session.history.enabled
>   true
> 
> 
>   hive.querylog.location
>   /tmp/home/hive/log/${user.name}
> 
> 
>   hive.exec.local.scratchdir
>   /tmp/hive/scratch/${user.name}
> 
> 
>   hive.metastore.uris
>   thrift://somehostname:9083
> 
> 
> 
>   hive.server2.authentication
>   KERBEROS
> 
> 
>   hive.server2.authentication.kerberos.principal
>   ***
> 
> 
>   hive.server2.authentication.kerberos.keytab
>   ***
> 
> 
>   hive.server2.thrift.sasl.qop
>   auth
>   Sasl QOP value; one of 'auth', 'auth-int' and 
> 'auth-conf'
> 
> 
>   hive.server2.enable.impersonation
>   Enable user impersonation for HiveServer2
>   true
> 
> 
> 
>   hive.metastore.sasl.enabled
>   true
> 
> 
>   hive.metastore.kerberos.keytab.file
>   ***
> 
> 
>   hive.metastore.kerberos.principal
>   ***
> 
> 
>   hive.metastore.cache.pinobjtypes
>   Table,Database,Type,FieldSchema,Order
> 
> 
>   hdfs_sentinel_file
>   ***
> 
> 
>   hive.metastore.warehouse.dir
>   /hive
> 
> 
>   hive.metastore.client.socket.timeout
>   600
> 
> 
>   hive.warehouse.subdir.inherit.perms
>   true
> 
> {code}
> Here, I'm attaching a more detail logs from Spark 1.3 rc1.
> {code}
> 2015-04-13 16:37:20,688 INFO  org.apache.hadoop.security.UserGroupInformation 
> (UserGroupInformation.

[jira] [Resolved] (SPARK-6882) Spark ThriftServer2 Kerberos failed encountering java.lang.IllegalArgumentException: Unknown auth type: null Allowed values are: [auth-int, auth-conf, auth]

2015-10-02 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved SPARK-6882.
---
   Resolution: Fixed
Fix Version/s: 1.5.1

> Spark ThriftServer2 Kerberos failed encountering 
> java.lang.IllegalArgumentException: Unknown auth type: null Allowed values 
> are: [auth-int, auth-conf, auth]
> 
>
> Key: SPARK-6882
> URL: https://issues.apache.org/jira/browse/SPARK-6882
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 1.3.0, 1.4.0
> Environment: * Apache Hadoop 2.4.1 with Kerberos Enabled
> * Apache Hive 0.13.1
> * Spark 1.2.1 git commit b6eaf77d4332bfb0a698849b1f5f917d20d70e97
> * Spark 1.3.0 rc1 commit label 0dcb5d9f31b713ed90bcec63ebc4e530cbb69851
>Reporter: Andrew Lee
> Fix For: 1.5.1
>
>
> When Kerberos is enabled, I get the following exceptions. 
> {code}
> 2015-03-13 18:26:05,363 ERROR 
> org.apache.hive.service.cli.thrift.ThriftCLIService 
> (ThriftBinaryCLIService.java:run(93)) - Error: 
> java.lang.IllegalArgumentException: Unknown auth type: null Allowed values 
> are: [auth-int, auth-conf, auth]
> {code}
> I tried it in
> * Spark 1.2.1 git commit b6eaf77d4332bfb0a698849b1f5f917d20d70e97
> * Spark 1.3.0 rc1 commit label 0dcb5d9f31b713ed90bcec63ebc4e530cbb69851
> with
> * Apache Hive 0.13.1
> * Apache Hadoop 2.4.1
> Build command
> {code}
> mvn -U -X -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Phive-thriftserver 
> -Dhadoop.version=2.4.1 -Dyarn.version=2.4.1 -Dhive.version=0.13.1 -DskipTests 
> install
> {code}
> When starting Spark ThriftServer in {{yarn-client}} mode, the command to 
> start thriftserver looks like this
> {code}
> ./start-thriftserver.sh --hiveconf hive.server2.thrift.port=2 --hiveconf 
> hive.server2.thrift.bind.host=$(hostname) --master yarn-client
> {code}
> {{hostname}} points to the current hostname of the machine I'm using.
> Error message in {{spark.log}} from Spark 1.2.1 (1.2 rc1)
> {code}
> 2015-03-13 18:26:05,363 ERROR 
> org.apache.hive.service.cli.thrift.ThriftCLIService 
> (ThriftBinaryCLIService.java:run(93)) - Error: 
> java.lang.IllegalArgumentException: Unknown auth type: null Allowed values 
> are: [auth-int, auth-conf, auth]
> at org.apache.hive.service.auth.SaslQOP.fromString(SaslQOP.java:56)
> at 
> org.apache.hive.service.auth.HiveAuthFactory.getSaslProperties(HiveAuthFactory.java:118)
> at 
> org.apache.hive.service.auth.HiveAuthFactory.getAuthTransFactory(HiveAuthFactory.java:133)
> at 
> org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:43)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> I'm wondering if this is due to the same problem described in HIVE-8154 
> HIVE-7620 due to an older code based for the Spark ThriftServer?
> Any insights are appreciated. Currently, I can't get Spark ThriftServer2 to 
> run against a Kerberos cluster (Apache 2.4.1).
> My hive-site.xml looks like the following for spark/conf.
> The kerberos keytab and tgt are configured correctly, I'm able to connect to 
> metastore, but the subsequent steps failed due to the exception.
> {code}
> 
>   hive.semantic.analyzer.factory.impl
>   org.apache.hcatalog.cli.HCatSemanticAnalyzerFactory
> 
> 
>   hive.metastore.execute.setugi
>   true
> 
> 
>   hive.stats.autogather
>   false
> 
> 
>   hive.session.history.enabled
>   true
> 
> 
>   hive.querylog.location
>   /tmp/home/hive/log/${user.name}
> 
> 
>   hive.exec.local.scratchdir
>   /tmp/hive/scratch/${user.name}
> 
> 
>   hive.metastore.uris
>   thrift://somehostname:9083
> 
> 
> 
>   hive.server2.authentication
>   KERBEROS
> 
> 
>   hive.server2.authentication.kerberos.principal
>   ***
> 
> 
>   hive.server2.authentication.kerberos.keytab
>   ***
> 
> 
>   hive.server2.thrift.sasl.qop
>   auth
>   Sasl QOP value; one of 'auth', 'auth-int' and 
> 'auth-conf'
> 
> 
>   hive.server2.enable.impersonation
>   Enable user impersonation for HiveServer2
>   true
> 
> 
> 
>   hive.metastore.sasl.enabled
>   true
> 
> 
>   hive.metastore.kerberos.keytab.file
>   ***
> 
> 
>   hive.metastore.kerberos.principal
>   ***
> 
> 
>   hive.metastore.cache.pinobjtypes
>   Table,Database,Type,FieldSchema,Order
> 
> 
>   hdfs_sentinel_file
>   ***
> 
> 
>   hive.metastore.warehouse.dir
>   /hive
> 
> 
>   hive.metastore.client.socket.timeout
>   600
> 
> 
>   hive.warehouse.subdir.inherit.perms
>   true
> 
> {code}
> Here, I'm attaching a more detail logs from Spark 1.3 rc1.
> {code}
> 2015-04-13 16:37:20,688 INFO  org.apache.hadoop.security.UserGroupInformation 
> (UserGroupInformation.java:loginUserFromKeytab(893)) - Login successful for 
> user hiveserver/alee-v

[jira] [Commented] (SPARK-6951) History server slow startup if the event log directory is large

2015-10-02 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941121#comment-14941121
 ] 

Steve Loughran commented on SPARK-6951:
---

There's a way to address this that I've been thinking of for the SPARK-1537 
integration, where there's the extra dependency on the timeline server running.

Essentially, it looks like the bus replay operation can be asynchronous: the UI 
is returned immediately while some (pooled) thread replays the event log.

> History server slow startup if the event log directory is large
> ---
>
> Key: SPARK-6951
> URL: https://issues.apache.org/jira/browse/SPARK-6951
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.3.0
>Reporter: Matt Cheah
>
> I started my history server, then navigated to the web UI where I expected to 
> be able to view some completed applications, but the webpage was not 
> available. It turned out that the History Server was not finished parsing all 
> of the event logs in the event log directory that I had specified. I had 
> accumulated a lot of event logs from months of running Spark, so it would 
> have taken a very long time for the History Server to crunch through them 
> all. I purged the event log directory and started from scratch, and the UI 
> loaded immediately.
> We should have a pagination strategy or parse the directory lazily to avoid 
> needing to wait after starting the history server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6108) No application number limit in spark history server

2015-10-02 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941125#comment-14941125
 ] 

Steve Loughran commented on SPARK-6108:
---

If this is an issue, then having a time window is probably better than number 
of files —though care would be needed to avoid deleting histories of running 
applications

> No application number limit in spark history server
> ---
>
> Key: SPARK-6108
> URL: https://issues.apache.org/jira/browse/SPARK-6108
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.2.1
>Reporter: Xia Hu
>Priority: Minor
>
> There isn't a limit for the application number in spark history server. The 
> only limit I found is "spark.history.retainedApplications", but this one only 
> controls how many apps could be stored in memory. 
> But I think a history application number limit is needed, for if it's number 
> is too big, it can be inconvenient for both HDFS and history server. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10910) spark.{executor,driver}.userClassPathFirst don't work for kryo (probably others)

2015-10-02 Thread Thomas Graves (JIRA)

Thomas Graves created SPARK-10910:
-

 Summary: spark.{executor,driver}.userClassPathFirst don't work for 
kryo (probably others)
 Key: SPARK-10910
 URL: https://issues.apache.org/jira/browse/SPARK-10910
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.5.1
Reporter: Thomas Graves


Trying to use spark.{executor,driver}.userClassPathFirst to put a newer version 
of kryo in doesn't work.   Note I was running on YARN.

There is a bug in kryo 1.21 that spark is using which is fixed in kryo 1.24.  A 
customer tried to use the spark.{executor,driver}.userClassPathFirst to include 
the newer version of kryo but it threw the following exception:

15/09/29 21:36:43 ERROR yarn.ApplicationMaster: User class threw exception: 
java.lang.LinkageError: loader constraint violation: loader (instance of 
org/apache/spark/util/ChildFirstURLClassLoader) previously initiated loading 
for a different type with name "com/esotericsoftware/kryo/Kryo"
java.lang.LinkageError: loader constraint violation: loader (instance of 
org/apache/spark/util/ChildFirstURLClassLoader) previously initiated loading 
for a different type with name "com/esotericsoftware/kryo/Kryo"
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)


The issue here is that the Spark Driver instantiates a kryo class in SparkEnv:

 val serializer = instantiateClassFromConf[Serializer](
  "spark.serializer", "org.apache.spark.serializer.JavaSerializer")
logDebug(s"Using serializer: ${serializer.getClass}")

It uses whatever version is in the spark assembly jar.

Then on YARN in the ApplicationMaster code before it starts the user 
application is handles the user classpath first to be the 
ChildFirstURLClassLoader, which is later used when kryo is needed. This tries 
to load the newer version of kryo from the user jar and it throws the exception 
above.

I'm sure this could happen with any number of other classes that got loaded by 
Spark before we try to run the user application code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10910) spark.{executor,driver}.userClassPathFirst don't work for kryo (probably others)

2015-10-02 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941175#comment-14941175
 ] 

Sean Owen commented on SPARK-10910:
---

I suspect this is one of several libraries that simply can't be overridden this 
way, because of the way they are used internally in Spark. There is a 
classloader problem no matter which way you turn. I can't say I know there's no 
way to make it work, but my expectation is that this would not work.

> spark.{executor,driver}.userClassPathFirst don't work for kryo (probably 
> others)
> 
>
> Key: SPARK-10910
> URL: https://issues.apache.org/jira/browse/SPARK-10910
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>
> Trying to use spark.{executor,driver}.userClassPathFirst to put a newer 
> version of kryo in doesn't work.   Note I was running on YARN.
> There is a bug in kryo 1.21 that spark is using which is fixed in kryo 1.24.  
> A customer tried to use the spark.{executor,driver}.userClassPathFirst to 
> include the newer version of kryo but it threw the following exception:
> 15/09/29 21:36:43 ERROR yarn.ApplicationMaster: User class threw exception: 
> java.lang.LinkageError: loader constraint violation: loader (instance of 
> org/apache/spark/util/ChildFirstURLClassLoader) previously initiated loading 
> for a different type with name "com/esotericsoftware/kryo/Kryo"
> java.lang.LinkageError: loader constraint violation: loader (instance of 
> org/apache/spark/util/ChildFirstURLClassLoader) previously initiated loading 
> for a different type with name "com/esotericsoftware/kryo/Kryo"
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
> The issue here is that the Spark Driver instantiates a kryo class in SparkEnv:
>  val serializer = instantiateClassFromConf[Serializer](
>   "spark.serializer", "org.apache.spark.serializer.JavaSerializer")
> logDebug(s"Using serializer: ${serializer.getClass}")
> It uses whatever version is in the spark assembly jar.
> Then on YARN in the ApplicationMaster code before it starts the user 
> application is handles the user classpath first to be the 
> ChildFirstURLClassLoader, which is later used when kryo is needed. This tries 
> to load the newer version of kryo from the user jar and it throws the 
> exception above.
> I'm sure this could happen with any number of other classes that got loaded 
> by Spark before we try to run the user application code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10910) spark.{executor,driver}.userClassPathFirst don't work for kryo (probably others)

2015-10-02 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941209#comment-14941209
 ] 

Thomas Graves commented on SPARK-10910:
---

So I understand that perhaps this mechanism won't work for it and its also 
marked experimental.   In this case spark.yarn.user.classpath.first worked 
because it puts it directly on the system classpath and  as Marcelo pointed 
out, extraClasspath probably would have worked also.

I think we should have a way for users to override libraries provided with 
Spark or shade them and make the user provide them.   I just want to make sure 
we don't deprecate a method of allowing them to in place of something that 
doesn't allow them to. 

> spark.{executor,driver}.userClassPathFirst don't work for kryo (probably 
> others)
> 
>
> Key: SPARK-10910
> URL: https://issues.apache.org/jira/browse/SPARK-10910
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>
> Trying to use spark.{executor,driver}.userClassPathFirst to put a newer 
> version of kryo in doesn't work.   Note I was running on YARN.
> There is a bug in kryo 1.21 that spark is using which is fixed in kryo 1.24.  
> A customer tried to use the spark.{executor,driver}.userClassPathFirst to 
> include the newer version of kryo but it threw the following exception:
> 15/09/29 21:36:43 ERROR yarn.ApplicationMaster: User class threw exception: 
> java.lang.LinkageError: loader constraint violation: loader (instance of 
> org/apache/spark/util/ChildFirstURLClassLoader) previously initiated loading 
> for a different type with name "com/esotericsoftware/kryo/Kryo"
> java.lang.LinkageError: loader constraint violation: loader (instance of 
> org/apache/spark/util/ChildFirstURLClassLoader) previously initiated loading 
> for a different type with name "com/esotericsoftware/kryo/Kryo"
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
> The issue here is that the Spark Driver instantiates a kryo class in SparkEnv:
>  val serializer = instantiateClassFromConf[Serializer](
>   "spark.serializer", "org.apache.spark.serializer.JavaSerializer")
> logDebug(s"Using serializer: ${serializer.getClass}")
> It uses whatever version is in the spark assembly jar.
> Then on YARN in the ApplicationMaster code before it starts the user 
> application is handles the user classpath first to be the 
> ChildFirstURLClassLoader, which is later used when kryo is needed. This tries 
> to load the newer version of kryo from the user jar and it throws the 
> exception above.
> I'm sure this could happen with any number of other classes that got loaded 
> by Spark before we try to run the user application code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10436) spark-submit overwrites spark.files defaults with the job script filename

2015-10-02 Thread Nakul Jindal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941225#comment-14941225
 ] 

Nakul Jindal edited comment on SPARK-10436 at 10/2/15 3:01 PM:
---

I am new to Spark and will be working on this.


was (Author: nakul02):
I am new to Spark and will take a look at it too.

> spark-submit overwrites spark.files defaults with the job script filename
> -
>
> Key: SPARK-10436
> URL: https://issues.apache.org/jira/browse/SPARK-10436
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.4.0
> Environment: Ubuntu, Spark 1.4.0 Standalone
>Reporter: axel dahl
>Priority: Minor
>  Labels: easyfix, feature
>
> In my spark-defaults.conf I have configured a set of libararies to be 
> uploaded to my Spark 1.4.0 Standalone cluster.  The entry appears as:
> spark.files  libarary.zip,file1.py,file2.py
> When I execute spark-submit -v test.py
> I see that spark-submit reads the defaults correctly, but that it overwrites 
> the "spark.files" default entry and replaces it with the name if the job 
> script, i.e. "test.py".
> This behavior doesn't seem intuitive.  test.py, should be added to the spark 
> working folder, but it should not overwrite the "spark.files" defaults.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10436) spark-submit overwrites spark.files defaults with the job script filename

2015-10-02 Thread Nakul Jindal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941225#comment-14941225
 ] 

Nakul Jindal commented on SPARK-10436:
--

I am new to Spark and will take a look at it too.

> spark-submit overwrites spark.files defaults with the job script filename
> -
>
> Key: SPARK-10436
> URL: https://issues.apache.org/jira/browse/SPARK-10436
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.4.0
> Environment: Ubuntu, Spark 1.4.0 Standalone
>Reporter: axel dahl
>Priority: Minor
>  Labels: easyfix, feature
>
> In my spark-defaults.conf I have configured a set of libararies to be 
> uploaded to my Spark 1.4.0 Standalone cluster.  The entry appears as:
> spark.files  libarary.zip,file1.py,file2.py
> When I execute spark-submit -v test.py
> I see that spark-submit reads the defaults correctly, but that it overwrites 
> the "spark.files" default entry and replaces it with the name if the job 
> script, i.e. "test.py".
> This behavior doesn't seem intuitive.  test.py, should be added to the spark 
> working folder, but it should not overwrite the "spark.files" defaults.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10909) Spark sql jdbc fails for Oracle NUMBER type columns

2015-10-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10909:


Assignee: Apache Spark

> Spark sql jdbc fails for Oracle NUMBER type columns
> ---
>
> Key: SPARK-10909
> URL: https://issues.apache.org/jira/browse/SPARK-10909
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: Dev
>Reporter: Kostas papageorgopoulos
>Assignee: Apache Spark
>  Labels: jdbc, newbie, sql
> Fix For: 1.5.1
>
>
> When using spark sql to connect to Oracle and run a spark sql query i get the 
> following exception "requirement failed: Overflowed precision" This is 
> triggered when in the dbTable definition it is included an Oracle NUMBER 
> column
> {code}
>SQLContext sqlContext = new SQLContext(sc);
> Map options = new HashMap<>();
> options.put("driver", "oracle.jdbc.OracleDriver");
> options.put("user", "USER");
> options.put("password", "PASS");
> options.put("url", "ORACLE CONNECTINO URL");
> options.put("dbtable", "(select VARCHAR_COLUMN 
> ,TIMESTAMP_COLUMN,NUMBER_COLUMN from lsc_subscription_profiles)");
> DataFrame jdbcDF = 
> sqlContext.read().format("jdbc").options(options).load();
> jdbcDF.toJavaRDD().saveAsTextFile("hdfs://hdfshost:8020" + 
> "/path/to/write.bz2", BZip2Codec.class);
> {code}
> using driver 
> {code}
>  com.oracle
> ojdbc6
> 11.2.0.3.0
> 
> 
> {code}
> Using java sun java jdk 1.8.51 along with spring4
> The classpath of the junit run is 
> {code}
> /home/kostas/dev2/tools/jdk1.8.0_51/bin/java 
> -agentlib:jdwp=transport=dt_socket,address=127.0.0.1:42901,suspend=y,server=n 
> -ea -Duser.timezone=Africa/Cairo -Dfile.encoding=UTF-8 -classpath 
> /home/kostas/dev2/tools/idea-IU-141.178.9/lib/idea_rt.jar:/home/kostas/dev2/tools/idea-IU-141.178.9/plugins/junit/lib/junit-rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfxswt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/deploy.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/charsets.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/javaws.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jce.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/resources.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/plugin.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfr.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jsse.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/management-agent.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunjce_provider.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunec.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/localedata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/jfxrt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/cldrdata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/nashorn.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/zipfs.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunpkcs11.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/dnsns.jar:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/test-classes:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/classes:/home/kostas/.m2/repository/org/apache/spark/spark-core_2.10/1.5.0/spark-core_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar:/home/kostas/.m2/repository/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar:/home/kostas/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/home/kostas/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar:/home/kostas/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/home/kostas/.m2/repository/org/objenesis/objenesis/1.2/objenesis-1.2.jar:/home/kostas/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-launcher_2.10/1.5.0/spark-launcher_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-common_2.10/1.5.0/spark-network-common_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-shuffle_2.10/1.5.0/spark-network-shuffle_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-unsafe_2.10/1.5.0/spark-unsafe_2.10-1.5.0.jar:/home/kostas/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/home/kostas/.m2/repository/org/apache/curator/curator-recipes/2.4.0/curator-recipes-2.4.0.jar:/home/kostas/.m2/repository/org/apache/curator/curator-framework/2.4.0/curator-framework

[jira] [Assigned] (SPARK-10909) Spark sql jdbc fails for Oracle NUMBER type columns

2015-10-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10909:


Assignee: (was: Apache Spark)

> Spark sql jdbc fails for Oracle NUMBER type columns
> ---
>
> Key: SPARK-10909
> URL: https://issues.apache.org/jira/browse/SPARK-10909
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: Dev
>Reporter: Kostas papageorgopoulos
>  Labels: jdbc, newbie, sql
> Fix For: 1.5.1
>
>
> When using spark sql to connect to Oracle and run a spark sql query i get the 
> following exception "requirement failed: Overflowed precision" This is 
> triggered when in the dbTable definition it is included an Oracle NUMBER 
> column
> {code}
>SQLContext sqlContext = new SQLContext(sc);
> Map options = new HashMap<>();
> options.put("driver", "oracle.jdbc.OracleDriver");
> options.put("user", "USER");
> options.put("password", "PASS");
> options.put("url", "ORACLE CONNECTINO URL");
> options.put("dbtable", "(select VARCHAR_COLUMN 
> ,TIMESTAMP_COLUMN,NUMBER_COLUMN from lsc_subscription_profiles)");
> DataFrame jdbcDF = 
> sqlContext.read().format("jdbc").options(options).load();
> jdbcDF.toJavaRDD().saveAsTextFile("hdfs://hdfshost:8020" + 
> "/path/to/write.bz2", BZip2Codec.class);
> {code}
> using driver 
> {code}
>  com.oracle
> ojdbc6
> 11.2.0.3.0
> 
> 
> {code}
> Using java sun java jdk 1.8.51 along with spring4
> The classpath of the junit run is 
> {code}
> /home/kostas/dev2/tools/jdk1.8.0_51/bin/java 
> -agentlib:jdwp=transport=dt_socket,address=127.0.0.1:42901,suspend=y,server=n 
> -ea -Duser.timezone=Africa/Cairo -Dfile.encoding=UTF-8 -classpath 
> /home/kostas/dev2/tools/idea-IU-141.178.9/lib/idea_rt.jar:/home/kostas/dev2/tools/idea-IU-141.178.9/plugins/junit/lib/junit-rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfxswt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/deploy.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/charsets.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/javaws.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jce.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/resources.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/plugin.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfr.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jsse.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/management-agent.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunjce_provider.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunec.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/localedata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/jfxrt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/cldrdata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/nashorn.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/zipfs.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunpkcs11.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/dnsns.jar:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/test-classes:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/classes:/home/kostas/.m2/repository/org/apache/spark/spark-core_2.10/1.5.0/spark-core_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar:/home/kostas/.m2/repository/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar:/home/kostas/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/home/kostas/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar:/home/kostas/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/home/kostas/.m2/repository/org/objenesis/objenesis/1.2/objenesis-1.2.jar:/home/kostas/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-launcher_2.10/1.5.0/spark-launcher_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-common_2.10/1.5.0/spark-network-common_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-shuffle_2.10/1.5.0/spark-network-shuffle_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-unsafe_2.10/1.5.0/spark-unsafe_2.10-1.5.0.jar:/home/kostas/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/home/kostas/.m2/repository/org/apache/curator/curator-recipes/2.4.0/curator-recipes-2.4.0.jar:/home/kostas/.m2/repository/org/apache/curator/curator-framework/2.4.0/curator-framework-2.4.0.jar:/home/kostas/.

[jira] [Commented] (SPARK-10909) Spark sql jdbc fails for Oracle NUMBER type columns

2015-10-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941228#comment-14941228
 ] 

Apache Spark commented on SPARK-10909:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/8963

> Spark sql jdbc fails for Oracle NUMBER type columns
> ---
>
> Key: SPARK-10909
> URL: https://issues.apache.org/jira/browse/SPARK-10909
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: Dev
>Reporter: Kostas papageorgopoulos
>  Labels: jdbc, newbie, sql
> Fix For: 1.5.1
>
>
> When using spark sql to connect to Oracle and run a spark sql query i get the 
> following exception "requirement failed: Overflowed precision" This is 
> triggered when in the dbTable definition it is included an Oracle NUMBER 
> column
> {code}
>SQLContext sqlContext = new SQLContext(sc);
> Map options = new HashMap<>();
> options.put("driver", "oracle.jdbc.OracleDriver");
> options.put("user", "USER");
> options.put("password", "PASS");
> options.put("url", "ORACLE CONNECTINO URL");
> options.put("dbtable", "(select VARCHAR_COLUMN 
> ,TIMESTAMP_COLUMN,NUMBER_COLUMN from lsc_subscription_profiles)");
> DataFrame jdbcDF = 
> sqlContext.read().format("jdbc").options(options).load();
> jdbcDF.toJavaRDD().saveAsTextFile("hdfs://hdfshost:8020" + 
> "/path/to/write.bz2", BZip2Codec.class);
> {code}
> using driver 
> {code}
>  com.oracle
> ojdbc6
> 11.2.0.3.0
> 
> 
> {code}
> Using java sun java jdk 1.8.51 along with spring4
> The classpath of the junit run is 
> {code}
> /home/kostas/dev2/tools/jdk1.8.0_51/bin/java 
> -agentlib:jdwp=transport=dt_socket,address=127.0.0.1:42901,suspend=y,server=n 
> -ea -Duser.timezone=Africa/Cairo -Dfile.encoding=UTF-8 -classpath 
> /home/kostas/dev2/tools/idea-IU-141.178.9/lib/idea_rt.jar:/home/kostas/dev2/tools/idea-IU-141.178.9/plugins/junit/lib/junit-rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfxswt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/deploy.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/charsets.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/javaws.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jce.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/resources.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/plugin.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfr.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jsse.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/management-agent.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunjce_provider.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunec.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/localedata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/jfxrt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/cldrdata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/nashorn.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/zipfs.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunpkcs11.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/dnsns.jar:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/test-classes:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/classes:/home/kostas/.m2/repository/org/apache/spark/spark-core_2.10/1.5.0/spark-core_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar:/home/kostas/.m2/repository/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar:/home/kostas/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/home/kostas/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar:/home/kostas/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/home/kostas/.m2/repository/org/objenesis/objenesis/1.2/objenesis-1.2.jar:/home/kostas/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-launcher_2.10/1.5.0/spark-launcher_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-common_2.10/1.5.0/spark-network-common_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-shuffle_2.10/1.5.0/spark-network-shuffle_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-unsafe_2.10/1.5.0/spark-unsafe_2.10-1.5.0.jar:/home/kostas/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/home/kostas/.m2/repository/org/apache/curator/curator-recipes/2.4.0/curator-recipes-2.4.0.jar:/

[jira] [Commented] (SPARK-10909) Spark sql jdbc fails for Oracle NUMBER type columns

2015-10-02 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941230#comment-14941230
 ] 

Liang-Chi Hsieh commented on SPARK-10909:
-

[~p02096] I created a pr for this problem. Since it is related to Oracle DB, it 
can't be easily tested with unit test, can you test this patch and see if it 
works? Thanks.

> Spark sql jdbc fails for Oracle NUMBER type columns
> ---
>
> Key: SPARK-10909
> URL: https://issues.apache.org/jira/browse/SPARK-10909
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: Dev
>Reporter: Kostas papageorgopoulos
>  Labels: jdbc, newbie, sql
> Fix For: 1.5.1
>
>
> When using spark sql to connect to Oracle and run a spark sql query i get the 
> following exception "requirement failed: Overflowed precision" This is 
> triggered when in the dbTable definition it is included an Oracle NUMBER 
> column
> {code}
>SQLContext sqlContext = new SQLContext(sc);
> Map options = new HashMap<>();
> options.put("driver", "oracle.jdbc.OracleDriver");
> options.put("user", "USER");
> options.put("password", "PASS");
> options.put("url", "ORACLE CONNECTINO URL");
> options.put("dbtable", "(select VARCHAR_COLUMN 
> ,TIMESTAMP_COLUMN,NUMBER_COLUMN from lsc_subscription_profiles)");
> DataFrame jdbcDF = 
> sqlContext.read().format("jdbc").options(options).load();
> jdbcDF.toJavaRDD().saveAsTextFile("hdfs://hdfshost:8020" + 
> "/path/to/write.bz2", BZip2Codec.class);
> {code}
> using driver 
> {code}
>  com.oracle
> ojdbc6
> 11.2.0.3.0
> 
> 
> {code}
> Using java sun java jdk 1.8.51 along with spring4
> The classpath of the junit run is 
> {code}
> /home/kostas/dev2/tools/jdk1.8.0_51/bin/java 
> -agentlib:jdwp=transport=dt_socket,address=127.0.0.1:42901,suspend=y,server=n 
> -ea -Duser.timezone=Africa/Cairo -Dfile.encoding=UTF-8 -classpath 
> /home/kostas/dev2/tools/idea-IU-141.178.9/lib/idea_rt.jar:/home/kostas/dev2/tools/idea-IU-141.178.9/plugins/junit/lib/junit-rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfxswt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/deploy.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/charsets.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/javaws.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jce.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/resources.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/plugin.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfr.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jsse.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/management-agent.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunjce_provider.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunec.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/localedata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/jfxrt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/cldrdata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/nashorn.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/zipfs.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunpkcs11.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/dnsns.jar:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/test-classes:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/classes:/home/kostas/.m2/repository/org/apache/spark/spark-core_2.10/1.5.0/spark-core_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar:/home/kostas/.m2/repository/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar:/home/kostas/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/home/kostas/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar:/home/kostas/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/home/kostas/.m2/repository/org/objenesis/objenesis/1.2/objenesis-1.2.jar:/home/kostas/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-launcher_2.10/1.5.0/spark-launcher_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-common_2.10/1.5.0/spark-network-common_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-shuffle_2.10/1.5.0/spark-network-shuffle_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-unsafe_2.10/1.5.0/spark-unsafe_2.10-1.5.0.jar:/home/kostas/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/home/kostas/.m

[jira] [Updated] (SPARK-10150) --force=true option is not working in beeline

2015-10-02 Thread Babulal (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Babulal updated SPARK-10150:

Component/s: (was: SQL)

> --force=true option is not working in beeline 
> --
>
> Key: SPARK-10150
> URL: https://issues.apache.org/jira/browse/SPARK-10150
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.4.0
> Environment: Suse Linux,hadoop version 2.7,
>Reporter: Babulal
>Priority: Minor
>
> Start thriftserver with default configuration 
> run beeline
>  bin/beeline -u jdbc:hive2://10.19.92.183:1 -f Beeline/commands.txt  
> --force=true  --outputformat=csv
> commands.txt is having commands
> show tables;
> select max(key) as maxdid from xyz;
> select sumddd(key) as maxdid from xyz;
> select avg(key) as maxdid from xyz;
> 3rd Query is wrong .and it gives error . but as beeline option --force 
> suggested it should go ahead and execute 4th Query but execution stopped at 
> 3rd query only 
> Result:-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10150) --force=true option is not working in beeline

2015-10-02 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10150:
--
Component/s: SQL

[~Bjangir] this should have a component and SQL is closest.

> --force=true option is not working in beeline 
> --
>
> Key: SPARK-10150
> URL: https://issues.apache.org/jira/browse/SPARK-10150
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: Suse Linux,hadoop version 2.7,
>Reporter: Babulal
>Priority: Minor
>
> Start thriftserver with default configuration 
> run beeline
>  bin/beeline -u jdbc:hive2://10.19.92.183:1 -f Beeline/commands.txt  
> --force=true  --outputformat=csv
> commands.txt is having commands
> show tables;
> select max(key) as maxdid from xyz;
> select sumddd(key) as maxdid from xyz;
> select avg(key) as maxdid from xyz;
> 3rd Query is wrong .and it gives error . but as beeline option --force 
> suggested it should go ahead and execute 4th Query but execution stopped at 
> 3rd query only 
> Result:-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10911) Executors should System.exit on clean shutdown

2015-10-02 Thread Thomas Graves (JIRA)

Thomas Graves created SPARK-10911:
-

 Summary: Executors should System.exit on clean shutdown
 Key: SPARK-10911
 URL: https://issues.apache.org/jira/browse/SPARK-10911
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.5.1
Reporter: Thomas Graves


Executors should call System.exit on clean shutdown to make sure all user 
threads exit and jvm shuts down.

We ran into a case where an Executor was left around for days trying to 
shutdown because the user code was using a non-daemon thread pool and one of 
those threads wasn't exiting.  We should force the jvm to go away with 
System.exit.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6882) Spark ThriftServer2 Kerberos failed encountering java.lang.IllegalArgumentException: Unknown auth type: null Allowed values are: [auth-int, auth-conf, auth]

2015-10-02 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6882:
-
Assignee: Steve Loughran

> Spark ThriftServer2 Kerberos failed encountering 
> java.lang.IllegalArgumentException: Unknown auth type: null Allowed values 
> are: [auth-int, auth-conf, auth]
> 
>
> Key: SPARK-6882
> URL: https://issues.apache.org/jira/browse/SPARK-6882
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 1.3.0, 1.4.0
> Environment: * Apache Hadoop 2.4.1 with Kerberos Enabled
> * Apache Hive 0.13.1
> * Spark 1.2.1 git commit b6eaf77d4332bfb0a698849b1f5f917d20d70e97
> * Spark 1.3.0 rc1 commit label 0dcb5d9f31b713ed90bcec63ebc4e530cbb69851
>Reporter: Andrew Lee
>Assignee: Steve Loughran
> Fix For: 1.5.1
>
>
> When Kerberos is enabled, I get the following exceptions. 
> {code}
> 2015-03-13 18:26:05,363 ERROR 
> org.apache.hive.service.cli.thrift.ThriftCLIService 
> (ThriftBinaryCLIService.java:run(93)) - Error: 
> java.lang.IllegalArgumentException: Unknown auth type: null Allowed values 
> are: [auth-int, auth-conf, auth]
> {code}
> I tried it in
> * Spark 1.2.1 git commit b6eaf77d4332bfb0a698849b1f5f917d20d70e97
> * Spark 1.3.0 rc1 commit label 0dcb5d9f31b713ed90bcec63ebc4e530cbb69851
> with
> * Apache Hive 0.13.1
> * Apache Hadoop 2.4.1
> Build command
> {code}
> mvn -U -X -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Phive-thriftserver 
> -Dhadoop.version=2.4.1 -Dyarn.version=2.4.1 -Dhive.version=0.13.1 -DskipTests 
> install
> {code}
> When starting Spark ThriftServer in {{yarn-client}} mode, the command to 
> start thriftserver looks like this
> {code}
> ./start-thriftserver.sh --hiveconf hive.server2.thrift.port=2 --hiveconf 
> hive.server2.thrift.bind.host=$(hostname) --master yarn-client
> {code}
> {{hostname}} points to the current hostname of the machine I'm using.
> Error message in {{spark.log}} from Spark 1.2.1 (1.2 rc1)
> {code}
> 2015-03-13 18:26:05,363 ERROR 
> org.apache.hive.service.cli.thrift.ThriftCLIService 
> (ThriftBinaryCLIService.java:run(93)) - Error: 
> java.lang.IllegalArgumentException: Unknown auth type: null Allowed values 
> are: [auth-int, auth-conf, auth]
> at org.apache.hive.service.auth.SaslQOP.fromString(SaslQOP.java:56)
> at 
> org.apache.hive.service.auth.HiveAuthFactory.getSaslProperties(HiveAuthFactory.java:118)
> at 
> org.apache.hive.service.auth.HiveAuthFactory.getAuthTransFactory(HiveAuthFactory.java:133)
> at 
> org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:43)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> I'm wondering if this is due to the same problem described in HIVE-8154 
> HIVE-7620 due to an older code based for the Spark ThriftServer?
> Any insights are appreciated. Currently, I can't get Spark ThriftServer2 to 
> run against a Kerberos cluster (Apache 2.4.1).
> My hive-site.xml looks like the following for spark/conf.
> The kerberos keytab and tgt are configured correctly, I'm able to connect to 
> metastore, but the subsequent steps failed due to the exception.
> {code}
> 
>   hive.semantic.analyzer.factory.impl
>   org.apache.hcatalog.cli.HCatSemanticAnalyzerFactory
> 
> 
>   hive.metastore.execute.setugi
>   true
> 
> 
>   hive.stats.autogather
>   false
> 
> 
>   hive.session.history.enabled
>   true
> 
> 
>   hive.querylog.location
>   /tmp/home/hive/log/${user.name}
> 
> 
>   hive.exec.local.scratchdir
>   /tmp/hive/scratch/${user.name}
> 
> 
>   hive.metastore.uris
>   thrift://somehostname:9083
> 
> 
> 
>   hive.server2.authentication
>   KERBEROS
> 
> 
>   hive.server2.authentication.kerberos.principal
>   ***
> 
> 
>   hive.server2.authentication.kerberos.keytab
>   ***
> 
> 
>   hive.server2.thrift.sasl.qop
>   auth
>   Sasl QOP value; one of 'auth', 'auth-int' and 
> 'auth-conf'
> 
> 
>   hive.server2.enable.impersonation
>   Enable user impersonation for HiveServer2
>   true
> 
> 
> 
>   hive.metastore.sasl.enabled
>   true
> 
> 
>   hive.metastore.kerberos.keytab.file
>   ***
> 
> 
>   hive.metastore.kerberos.principal
>   ***
> 
> 
>   hive.metastore.cache.pinobjtypes
>   Table,Database,Type,FieldSchema,Order
> 
> 
>   hdfs_sentinel_file
>   ***
> 
> 
>   hive.metastore.warehouse.dir
>   /hive
> 
> 
>   hive.metastore.client.socket.timeout
>   600
> 
> 
>   hive.warehouse.subdir.inherit.perms
>   true
> 
> {code}
> Here, I'm attaching a more detail logs from Spark 1.3 rc1.
> {code}
> 2015-04-13 16:37:20,688 INFO  org.apache.hadoop.security.UserGroupInformation 
> (UserGroupInformation.java:loginUserFromKeytab(893)) - Login successful for 
> user hiveserver/a

[jira] [Updated] (SPARK-10865) [Spark SQL] [UDF] the ceil/ceiling function got wrong return value type

2015-10-02 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10865:
--
Assignee: Cheng Hao

> [Spark SQL] [UDF] the ceil/ceiling function got wrong return value type
> ---
>
> Key: SPARK-10865
> URL: https://issues.apache.org/jira/browse/SPARK-10865
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>Assignee: Cheng Hao
> Fix For: 1.6.0
>
>
> As per ceil/ceiling definition,it should get BIGINT return value
> -ceil(DOUBLE a), ceiling(DOUBLE a)
> -Returns the minimum BIGINT value that is equal to or greater than a.
> But in current Spark implementation, it got wrong value type.
> e.g., 
> select ceil(2642.12) from udf_test_web_sales limit 1;
> 2643.0
> In hive implementation, it got return value type like below:
> hive> select ceil(2642.12) from udf_test_web_sales limit 1;
> OK
> 2643



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10866) [Spark SQL] [UDF] the floor function got wrong return value type

2015-10-02 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10866:
--
Assignee: Cheng Hao

> [Spark SQL] [UDF] the floor function got wrong return value type
> 
>
> Key: SPARK-10866
> URL: https://issues.apache.org/jira/browse/SPARK-10866
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>Assignee: Cheng Hao
> Fix For: 1.6.0
>
>
> As per floor definition,it should get BIGINT return value
> -floor(DOUBLE a)
> -Returns the maximum BIGINT value that is equal to or less than a.
> But in current Spark implementation, it got wrong value type.
> e.g.,
> select floor(2642.12) from udf_test_web_sales limit 1;
> 2642.0
> In hive implementation, it got return value type like below:
> hive> select ceil(2642.12) from udf_test_web_sales limit 1;
> OK
> 2642



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10911) Executors should System.exit on clean shutdown

2015-10-02 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941247#comment-14941247
 ] 

Thomas Graves commented on SPARK-10911:
---

Note in this case I was running on Yarn and its using the 
CoarseGrainedExecutorBackend

> Executors should System.exit on clean shutdown
> --
>
> Key: SPARK-10911
> URL: https://issues.apache.org/jira/browse/SPARK-10911
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>
> Executors should call System.exit on clean shutdown to make sure all user 
> threads exit and jvm shuts down.
> We ran into a case where an Executor was left around for days trying to 
> shutdown because the user code was using a non-daemon thread pool and one of 
> those threads wasn't exiting.  We should force the jvm to go away with 
> System.exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10905) Export freqItems() for DataFrameStatFunctions in SparkR

2015-10-02 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10905:
--
Target Version/s:   (was: 1.6.0)
Priority: Minor  (was: Major)
   Fix Version/s: (was: 1.6.0)

[~rerngvit] have a look at 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark  You 
shouldn't set Fix/Target version.

> Export freqItems() for DataFrameStatFunctions in SparkR
> ---
>
> Key: SPARK-10905
> URL: https://issues.apache.org/jira/browse/SPARK-10905
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: rerngvit yanggratoke
>Priority: Minor
>
> Currently only crosstab is implemented. This subtask is about adding 
> freqItems() API to sparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10909) Spark sql jdbc fails for Oracle NUMBER type columns

2015-10-02 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10909:
--
Target Version/s:   (was: 1.5.0)
Priority: Minor  (was: Major)
   Fix Version/s: (was: 1.5.1)

[~p02096] please read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark first. 
This can't possibly Target 1.5.0, which is released, nor be fixed in 1.5.1, 
because it's not resolved.

> Spark sql jdbc fails for Oracle NUMBER type columns
> ---
>
> Key: SPARK-10909
> URL: https://issues.apache.org/jira/browse/SPARK-10909
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: Dev
>Reporter: Kostas papageorgopoulos
>Priority: Minor
>  Labels: jdbc, newbie, sql
>
> When using spark sql to connect to Oracle and run a spark sql query i get the 
> following exception "requirement failed: Overflowed precision" This is 
> triggered when in the dbTable definition it is included an Oracle NUMBER 
> column
> {code}
>SQLContext sqlContext = new SQLContext(sc);
> Map options = new HashMap<>();
> options.put("driver", "oracle.jdbc.OracleDriver");
> options.put("user", "USER");
> options.put("password", "PASS");
> options.put("url", "ORACLE CONNECTINO URL");
> options.put("dbtable", "(select VARCHAR_COLUMN 
> ,TIMESTAMP_COLUMN,NUMBER_COLUMN from lsc_subscription_profiles)");
> DataFrame jdbcDF = 
> sqlContext.read().format("jdbc").options(options).load();
> jdbcDF.toJavaRDD().saveAsTextFile("hdfs://hdfshost:8020" + 
> "/path/to/write.bz2", BZip2Codec.class);
> {code}
> using driver 
> {code}
>  com.oracle
> ojdbc6
> 11.2.0.3.0
> 
> 
> {code}
> Using java sun java jdk 1.8.51 along with spring4
> The classpath of the junit run is 
> {code}
> /home/kostas/dev2/tools/jdk1.8.0_51/bin/java 
> -agentlib:jdwp=transport=dt_socket,address=127.0.0.1:42901,suspend=y,server=n 
> -ea -Duser.timezone=Africa/Cairo -Dfile.encoding=UTF-8 -classpath 
> /home/kostas/dev2/tools/idea-IU-141.178.9/lib/idea_rt.jar:/home/kostas/dev2/tools/idea-IU-141.178.9/plugins/junit/lib/junit-rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfxswt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/deploy.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/charsets.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/javaws.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jce.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/resources.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/plugin.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfr.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jsse.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/management-agent.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunjce_provider.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunec.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/localedata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/jfxrt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/cldrdata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/nashorn.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/zipfs.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunpkcs11.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/dnsns.jar:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/test-classes:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/classes:/home/kostas/.m2/repository/org/apache/spark/spark-core_2.10/1.5.0/spark-core_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar:/home/kostas/.m2/repository/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar:/home/kostas/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/home/kostas/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar:/home/kostas/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/home/kostas/.m2/repository/org/objenesis/objenesis/1.2/objenesis-1.2.jar:/home/kostas/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-launcher_2.10/1.5.0/spark-launcher_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-common_2.10/1.5.0/spark-network-common_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-shuffle_2.10/1.5.0/spark-network-shuffle_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-unsafe_2.10/1.5.0/spark-unsafe_2.10-1.5.0.jar:/home/kostas/

[jira] [Updated] (SPARK-10909) Spark sql jdbc fails for Oracle NUMBER type columns

2015-10-02 Thread Kostas papageorgopoulos (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostas papageorgopoulos updated SPARK-10909:

Affects Version/s: (was: 1.5.0)
   1.5.1

> Spark sql jdbc fails for Oracle NUMBER type columns
> ---
>
> Key: SPARK-10909
> URL: https://issues.apache.org/jira/browse/SPARK-10909
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Dev
>Reporter: Kostas papageorgopoulos
>Priority: Minor
>  Labels: jdbc, newbie, sql
>
> When using spark sql to connect to Oracle and run a spark sql query i get the 
> following exception "requirement failed: Overflowed precision" This is 
> triggered when in the dbTable definition it is included an Oracle NUMBER 
> column
> {code}
>SQLContext sqlContext = new SQLContext(sc);
> Map options = new HashMap<>();
> options.put("driver", "oracle.jdbc.OracleDriver");
> options.put("user", "USER");
> options.put("password", "PASS");
> options.put("url", "ORACLE CONNECTINO URL");
> options.put("dbtable", "(select VARCHAR_COLUMN 
> ,TIMESTAMP_COLUMN,NUMBER_COLUMN from lsc_subscription_profiles)");
> DataFrame jdbcDF = 
> sqlContext.read().format("jdbc").options(options).load();
> jdbcDF.toJavaRDD().saveAsTextFile("hdfs://hdfshost:8020" + 
> "/path/to/write.bz2", BZip2Codec.class);
> {code}
> using driver 
> {code}
>  com.oracle
> ojdbc6
> 11.2.0.3.0
> 
> 
> {code}
> Using java sun java jdk 1.8.51 along with spring4
> The classpath of the junit run is 
> {code}
> /home/kostas/dev2/tools/jdk1.8.0_51/bin/java 
> -agentlib:jdwp=transport=dt_socket,address=127.0.0.1:42901,suspend=y,server=n 
> -ea -Duser.timezone=Africa/Cairo -Dfile.encoding=UTF-8 -classpath 
> /home/kostas/dev2/tools/idea-IU-141.178.9/lib/idea_rt.jar:/home/kostas/dev2/tools/idea-IU-141.178.9/plugins/junit/lib/junit-rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfxswt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/deploy.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/charsets.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/javaws.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jce.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/resources.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/plugin.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfr.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jsse.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/management-agent.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunjce_provider.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunec.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/localedata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/jfxrt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/cldrdata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/nashorn.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/zipfs.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunpkcs11.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/dnsns.jar:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/test-classes:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/classes:/home/kostas/.m2/repository/org/apache/spark/spark-core_2.10/1.5.0/spark-core_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar:/home/kostas/.m2/repository/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar:/home/kostas/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/home/kostas/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar:/home/kostas/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/home/kostas/.m2/repository/org/objenesis/objenesis/1.2/objenesis-1.2.jar:/home/kostas/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-launcher_2.10/1.5.0/spark-launcher_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-common_2.10/1.5.0/spark-network-common_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-shuffle_2.10/1.5.0/spark-network-shuffle_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-unsafe_2.10/1.5.0/spark-unsafe_2.10-1.5.0.jar:/home/kostas/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/home/kostas/.m2/repository/org/apache/curator/curator-recipes/2.4.0/curator-recipes-2.4.0.jar:/home/kostas/.m2/repository/org/apache/curator/curator-framework/2.4.

[jira] [Commented] (SPARK-10909) Spark sql jdbc fails for Oracle NUMBER type columns

2015-10-02 Thread Kostas papageorgopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941257#comment-14941257
 ] 

Kostas papageorgopoulos commented on SPARK-10909:
-

You are correct. That was a big miss. I will fully read it before i open any 
other jira.

> Spark sql jdbc fails for Oracle NUMBER type columns
> ---
>
> Key: SPARK-10909
> URL: https://issues.apache.org/jira/browse/SPARK-10909
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Dev
>Reporter: Kostas papageorgopoulos
>Priority: Minor
>  Labels: jdbc, newbie, sql
>
> When using spark sql to connect to Oracle and run a spark sql query i get the 
> following exception "requirement failed: Overflowed precision" This is 
> triggered when in the dbTable definition it is included an Oracle NUMBER 
> column
> {code}
>SQLContext sqlContext = new SQLContext(sc);
> Map options = new HashMap<>();
> options.put("driver", "oracle.jdbc.OracleDriver");
> options.put("user", "USER");
> options.put("password", "PASS");
> options.put("url", "ORACLE CONNECTINO URL");
> options.put("dbtable", "(select VARCHAR_COLUMN 
> ,TIMESTAMP_COLUMN,NUMBER_COLUMN from lsc_subscription_profiles)");
> DataFrame jdbcDF = 
> sqlContext.read().format("jdbc").options(options).load();
> jdbcDF.toJavaRDD().saveAsTextFile("hdfs://hdfshost:8020" + 
> "/path/to/write.bz2", BZip2Codec.class);
> {code}
> using driver 
> {code}
>  com.oracle
> ojdbc6
> 11.2.0.3.0
> 
> 
> {code}
> Using java sun java jdk 1.8.51 along with spring4
> The classpath of the junit run is 
> {code}
> /home/kostas/dev2/tools/jdk1.8.0_51/bin/java 
> -agentlib:jdwp=transport=dt_socket,address=127.0.0.1:42901,suspend=y,server=n 
> -ea -Duser.timezone=Africa/Cairo -Dfile.encoding=UTF-8 -classpath 
> /home/kostas/dev2/tools/idea-IU-141.178.9/lib/idea_rt.jar:/home/kostas/dev2/tools/idea-IU-141.178.9/plugins/junit/lib/junit-rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfxswt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/deploy.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/charsets.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/rt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/javaws.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jce.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/resources.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/plugin.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jfr.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/jsse.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/management-agent.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunjce_provider.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunec.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/localedata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/jfxrt.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/cldrdata.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/nashorn.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/zipfs.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/sunpkcs11.jar:/home/kostas/dev2/tools/jdk1.8.0_51/jre/lib/ext/dnsns.jar:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/test-classes:/home/kostas/dev2/projects/atlas_reporting/atlas-core/target/classes:/home/kostas/.m2/repository/org/apache/spark/spark-core_2.10/1.5.0/spark-core_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/home/kostas/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar:/home/kostas/.m2/repository/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar:/home/kostas/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/home/kostas/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar:/home/kostas/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/home/kostas/.m2/repository/org/objenesis/objenesis/1.2/objenesis-1.2.jar:/home/kostas/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-launcher_2.10/1.5.0/spark-launcher_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-common_2.10/1.5.0/spark-network-common_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-network-shuffle_2.10/1.5.0/spark-network-shuffle_2.10-1.5.0.jar:/home/kostas/.m2/repository/org/apache/spark/spark-unsafe_2.10/1.5.0/spark-unsafe_2.10-1.5.0.jar:/home/kostas/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/home/kostas/.m2/repository/org/apache/curator/curator-recipes/2.4.0/curator-recipes-

[jira] [Updated] (SPARK-10889) Upgrade Kinesis Client Library

2015-10-02 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10889:
--
Affects Version/s: (was: 1.5.2)
   (was: 1.6.0)
   (was: 1.4.2)
   1.5.1

> Upgrade Kinesis Client Library
> --
>
> Key: SPARK-10889
> URL: https://issues.apache.org/jira/browse/SPARK-10889
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.1
>Reporter: Avrohom Katz
>Priority: Minor
>
> Kinesis Client Library added a custom cloudwatch metric in 1.3.0 called 
> MillisBehindLatest. This is very important for capacity planning and alerting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10889) Upgrade Kinesis Client Library

2015-10-02 Thread Avrohom Katz (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941282#comment-14941282
 ] 

Avrohom Katz commented on SPARK-10889:
--

Is there a way to get this change into the 1.4.x line of releases?

> Upgrade Kinesis Client Library
> --
>
> Key: SPARK-10889
> URL: https://issues.apache.org/jira/browse/SPARK-10889
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.1
>Reporter: Avrohom Katz
>Priority: Minor
>
> Kinesis Client Library added a custom cloudwatch metric in 1.3.0 called 
> MillisBehindLatest. This is very important for capacity planning and alerting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7275) Make LogicalRelation public

2015-10-02 Thread Glenn Weidner (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941300#comment-14941300
 ] 

Glenn Weidner commented on SPARK-7275:
--

I'll submit another pull request today.  Thanks!

> Make LogicalRelation public
> ---
>
> Key: SPARK-7275
> URL: https://issues.apache.org/jira/browse/SPARK-7275
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Santiago M. Mola
>Priority: Minor
>
> It seems LogicalRelation is the only part of the LogicalPlan that is not 
> public. This makes it harder to work with full logical plans from third party 
> packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10912) Improve Spark metrics executor.filesystem

2015-10-02 Thread Yongjia Wang (JIRA)

Yongjia Wang created SPARK-10912:


 Summary: Improve Spark metrics executor.filesystem
 Key: SPARK-10912
 URL: https://issues.apache.org/jira/browse/SPARK-10912
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.5.0
Reporter: Yongjia Wang


In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: "hdfs" 
and "file". I started using s3 as the persistent storage with Spark standalone 
cluster in EC2, and s3 read/write metrics do not appear anywhere. The 'file' 
metric appears to be only for driver reading local file, it would be nice to 
also report shuffle read/write metrics, so it can help understand things like 
if a Spark job becomes IO bound.
I think these 2 things (s3 and shuffle) are very useful and cover all the 
missing information about Spark IO especially for s3 setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10912) Improve Spark metrics executor.filesystem

2015-10-02 Thread Yongjia Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjia Wang updated SPARK-10912:
-
Description: 
In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: "hdfs" 
and "file". I started using s3 as the persistent storage with Spark standalone 
cluster in EC2, and s3 read/write metrics do not appear anywhere. The 'file' 
metric appears to be only for driver reading local file, it would be nice to 
also report shuffle read/write metrics, so it can help with optimization.
I think these 2 things (s3 and shuffle) are very useful and cover all the 
missing information about Spark IO especially for s3 setup.

  was:
In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: "hdfs" 
and "file". I started using s3 as the persistent storage with Spark standalone 
cluster in EC2, and s3 read/write metrics do not appear anywhere. The 'file' 
metric appears to be only for driver reading local file, it would be nice to 
also report shuffle read/write metrics, so it can help understand things like 
if a Spark job becomes IO bound.
I think these 2 things (s3 and shuffle) are very useful and cover all the 
missing information about Spark IO especially for s3 setup.


> Improve Spark metrics executor.filesystem
> -
>
> Key: SPARK-10912
> URL: https://issues.apache.org/jira/browse/SPARK-10912
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: Yongjia Wang
>
> In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: 
> "hdfs" and "file". I started using s3 as the persistent storage with Spark 
> standalone cluster in EC2, and s3 read/write metrics do not appear anywhere. 
> The 'file' metric appears to be only for driver reading local file, it would 
> be nice to also report shuffle read/write metrics, so it can help with 
> optimization.
> I think these 2 things (s3 and shuffle) are very useful and cover all the 
> missing information about Spark IO especially for s3 setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10903) Make sqlContext global

2015-10-02 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941353#comment-14941353
 ] 

Shivaram Venkataraman commented on SPARK-10903:
---

I'd say lets have one common function to find the sqlContext (the hierarchy in 
toDF looks good to me) and use that everywhere. Basically after calling 
sparkRSQL.init(sc) the user shouldn't care about the return value and just use 
SparkR functions like read.df("/a/b.csv")

> Make sqlContext global 
> ---
>
> Key: SPARK-10903
> URL: https://issues.apache.org/jira/browse/SPARK-10903
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Narine Kokhlikyan
>Priority: Minor
>
> Make sqlContext global so that we don't have to always specify it.
> e.g. createDataFrame(iris) instead of createDataFrame(sqlContext, iris)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7499) Investigate how to specify columns in SparkR without $ or strings

2015-10-02 Thread Weiqiang Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941371#comment-14941371
 ] 

Weiqiang Zhuang commented on SPARK-7499:


Wondering if there is any update on this one? Thanks.

> Investigate how to specify columns in SparkR without $ or strings
> -
>
> Key: SPARK-7499
> URL: https://issues.apache.org/jira/browse/SPARK-7499
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> Right now in SparkR we need to specify the columns used using `$` or strings. 
> For example to run select we would do
> {code}
> df1 <- select(df, df$age > 10)
> {code}
> It would be good to infer the set of columns in a dataframe automatically and 
> resolve symbols for column names. For example
> {code} 
> df1 <- select(df, age > 10)
> {code}
> One way to do this is to build an environment with all the column names to 
> column handles and then use `substitute(arg, env = columnNameEnv)`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10894) Add 'drop' support for DataFrame's subset function

2015-10-02 Thread Weiqiang Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941375#comment-14941375
 ] 

Weiqiang Zhuang commented on SPARK-10894:
-

Guess it will be difficult to explain this to existing customers. BTW, dplyr 
references the column by its name instead of '$' function. We should be also 
able to implement that with substitute, eval and parse R functions. In fact, I 
just looked at this JIRA https://issues.apache.org/jira/browse/SPARK-7499, 
which looks like the better approach. But maybe there are other aspects of the 
original design that this can't be done. 

> Add 'drop' support for DataFrame's subset function
> --
>
> Key: SPARK-10894
> URL: https://issues.apache.org/jira/browse/SPARK-10894
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Weiqiang Zhuang
>
> SparkR DataFrame can be subset to get one or more columns of the dataset. The 
> current '[' implementation does not support 'drop' when is asked for just one 
> column. This is not consistent with the R syntax:
> x[i, j, ... , drop = TRUE]
> # in R, when drop is FALSE, remain as data.frame
> > class(iris[, "Sepal.Width", drop=F])
> [1] "data.frame"
> # when drop is TRUE (default), drop to be a vector
> > class(iris[, "Sepal.Width", drop=T])
> [1] "numeric"
> > class(iris[,"Sepal.Width"])
> [1] "numeric"
> > df <- createDataFrame(sqlContext, iris)
> # in SparkR, 'drop' argument has no impact
> > class(df[,"Sepal_Width", drop=F])
> [1] "DataFrame"
> attr(,"package")
> [1] "SparkR"
> # should have dropped to be a Column class instead
> > class(df[,"Sepal_Width", drop=T])
> [1] "DataFrame"
> attr(,"package")
> [1] "SparkR"
> > class(df[,"Sepal_Width"])
> [1] "DataFrame"
> attr(,"package")
> [1] "SparkR"
> We should add the 'drop' support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10913) Add attach() function for DataFrame

2015-10-02 Thread Weiqiang Zhuang (JIRA)

Weiqiang Zhuang created SPARK-10913:
---

 Summary: Add attach() function for DataFrame
 Key: SPARK-10913
 URL: https://issues.apache.org/jira/browse/SPARK-10913
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Weiqiang Zhuang
Priority: Minor


Need a R-like attach() API: "Attach Set of R Objects to Search Path"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10913) Add attach() function for DataFrame

2015-10-02 Thread Weiqiang Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941393#comment-14941393
 ] 

Weiqiang Zhuang commented on SPARK-10913:
-

The attach() API is really helpful where each column of the DataFrame can be 
directly accessed by the name. I will be working on this.

> Add attach() function for DataFrame
> ---
>
> Key: SPARK-10913
> URL: https://issues.apache.org/jira/browse/SPARK-10913
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Weiqiang Zhuang
>Priority: Minor
>
> Need a R-like attach() API: "Attach Set of R Objects to Search Path"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9622) DecisionTreeRegressor: provide variance of prediction

2015-10-02 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-9622:
-
Shepherd: Joseph K. Bradley
Target Version/s: 1.6.0

> DecisionTreeRegressor: provide variance of prediction
> -
>
> Key: SPARK-9622
> URL: https://issues.apache.org/jira/browse/SPARK-9622
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Minor
>
> Variance of predicted value, as estimated from training data.
> Analogous to class probabilities for classification.
> See [SPARK-3727] for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9798) CrossValidatorModel Documentation Improvements

2015-10-02 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-9798:
-
Assignee: rerngvit yanggratoke

> CrossValidatorModel Documentation Improvements
> --
>
> Key: SPARK-9798
> URL: https://issues.apache.org/jira/browse/SPARK-9798
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Feynman Liang
>Assignee: rerngvit yanggratoke
>Priority: Minor
>  Labels: starter
>
> CrossValidatorModel's avgMetrics and bestModel need documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9798) CrossValidatorModel Documentation Improvements

2015-10-02 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-9798.
--
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 8882
[https://github.com/apache/spark/pull/8882]

> CrossValidatorModel Documentation Improvements
> --
>
> Key: SPARK-9798
> URL: https://issues.apache.org/jira/browse/SPARK-9798
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Feynman Liang
>Assignee: rerngvit yanggratoke
>Priority: Minor
>  Labels: starter
> Fix For: 1.6.0
>
>
> CrossValidatorModel's avgMetrics and bestModel need documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10913) Add attach() function for DataFrame

2015-10-02 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941427#comment-14941427
 ] 

Shivaram Venkataraman commented on SPARK-10913:
---

Can this also address SPARK-7499 or are there major differences ?

> Add attach() function for DataFrame
> ---
>
> Key: SPARK-10913
> URL: https://issues.apache.org/jira/browse/SPARK-10913
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Weiqiang Zhuang
>Priority: Minor
>
> Need a R-like attach() API: "Attach Set of R Objects to Search Path"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5890) Add QuantileDiscretizer

2015-10-02 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-5890.
--
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 5779
[https://github.com/apache/spark/pull/5779]

> Add QuantileDiscretizer
> ---
>
> Key: SPARK-5890
> URL: https://issues.apache.org/jira/browse/SPARK-5890
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xiangrui Meng
>Assignee: Xusen Yin
> Fix For: 1.6.0
>
>
> A `QuantileDiscretizer` takes a column with continuous features and outputs a 
> column with binned categorical features.
> {code}
> val fd = new QuantileDiscretizer()
>   .setInputCol("age")
>   .setNumBins(32)
>   .setOutputCol("ageBins")
> {code}
> This should an automatic feature discretizer, which uses a simple algorithm 
> like approximate quantiles to discretize features. It should set the ML 
> attribute correctly in the output column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6530) ChiSqSelector transformer

2015-10-02 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-6530.
--
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 5742
[https://github.com/apache/spark/pull/5742]

> ChiSqSelector transformer
> -
>
> Key: SPARK-6530
> URL: https://issues.apache.org/jira/browse/SPARK-6530
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xusen Yin
>Assignee: Xusen Yin
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5890) Add QuantileDiscretizer

2015-10-02 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941444#comment-14941444
 ] 

Joseph K. Bradley commented on SPARK-5890:
--

[~yinxusen] Could you please create JIRAs for: Python API + programming guide 
update?

> Add QuantileDiscretizer
> ---
>
> Key: SPARK-5890
> URL: https://issues.apache.org/jira/browse/SPARK-5890
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xiangrui Meng
>Assignee: Xusen Yin
> Fix For: 1.6.0
>
>
> A `QuantileDiscretizer` takes a column with continuous features and outputs a 
> column with binned categorical features.
> {code}
> val fd = new QuantileDiscretizer()
>   .setInputCol("age")
>   .setNumBins(32)
>   .setOutputCol("ageBins")
> {code}
> This should an automatic feature discretizer, which uses a simple algorithm 
> like approximate quantiles to discretize features. It should set the ML 
> attribute correctly in the output column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6530) ChiSqSelector transformer

2015-10-02 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941446#comment-14941446
 ] 

Joseph K. Bradley commented on SPARK-6530:
--

[~yinxusen] Could you please create JIRAs for: Python API + programming guide 
update?

> ChiSqSelector transformer
> -
>
> Key: SPARK-6530
> URL: https://issues.apache.org/jira/browse/SPARK-6530
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xusen Yin
>Assignee: Xusen Yin
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10913) Add attach() function for DataFrame

2015-10-02 Thread Weiqiang Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941449#comment-14941449
 ] 

Weiqiang Zhuang commented on SPARK-10913:
-

I think they are different issues. If I understand, SPARK-7499 deals with cases 
where inside a function of a DataFrame, the column of the DataFrame may be 
referenced by the name. For example, the select function: select(df, 
Sepal_Width). Where the first input is the DataFrame and the second input is 
one of the columns of the DataFrame.

This issue, however, is to implement the attach() function which will allow to 
use the object (any column of a DataFrame) by the name without specifying again 
which DataFrame the object associated with. For example:

attach(df)
head(Sepal_Width)
summary(Sepal_Width)
detach()

The head() and summary() functions know exactly where to find the object for 
Sepal_Width without specifying it as df[, "Sepal_Width"].

> Add attach() function for DataFrame
> ---
>
> Key: SPARK-10913
> URL: https://issues.apache.org/jira/browse/SPARK-10913
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Weiqiang Zhuang
>Priority: Minor
>
> Need a R-like attach() API: "Attach Set of R Objects to Search Path"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-10-02 Thread Miao Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941461#comment-14941461
 ] 

Miao Wang commented on SPARK-10798:
---

Hi Dev,

I am interested in working on this issue. Now I am reproducing the issue.

Thanks!

Miao

> JsonMappingException with Spark Context Parallelize
> ---
>
> Key: SPARK-10798
> URL: https://issues.apache.org/jira/browse/SPARK-10798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: Linux, Java 1.8.45
>Reporter: Dev Lakhani
>
> When trying to create an RDD of Rows using a Java Spark Context and if I 
> serialize the rows with Kryo first, the sparkContext fails.
> byte[] data= Kryo.serialize(List)
> List fromKryoRows=Kryo.unserialize(data)
> List rows= new Vector(); //using a new set of data.
> rows.add(RowFactory.create("test"));
> javaSparkContext.parallelize(rows);
> OR
> javaSparkContext.parallelize(fromKryoRows); //using deserialized rows
> I get :
> com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
> scala.Tuple2) (through reference chain: 
> org.apache.spark.rdd.RDDOperationScope["parent"])
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
>at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
>at 
> com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
>at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>at 
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>at 
> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>at 
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
>...
> Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at scala.Option.getOrElse(Option.scala:120)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
>at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
>... 19 more
> I've tried updating jackson module scala to 2.6.1 but same issue. This 
> happens in local mode with java 1.8_45. I searched the web and this Jira for 
> similar issues but found nothing of interest.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10914) Incorrect empty join sets

2015-10-02 Thread Ben Moran (JIRA)

Ben Moran created SPARK-10914:
-

 Summary: Incorrect empty join sets
 Key: SPARK-10914
 URL: https://issues.apache.org/jira/browse/SPARK-10914
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.1, 1.5.0
 Environment: Ubuntu 14.04 (spark-slave), 12.04 (master)

Reporter: Ben Moran


Using an inner join, to match together two integer columns, I generally get no 
results when there should be matches.  But the results vary and depend on 
whether the dataframes are coming from SQL, JSON, or cached, as well as the 
order in which I cache things and query them.

This minimal example reproduces it consistently for me in the spark-shell, on 
new installs of both 1.5.0 and 1.5.1 (pre-built against Hadoop 2.6 from 
http://spark.apache.org/downloads.html.)

/* x is {"xx":1}{"xx":2} and y is just {"yy":1}{"yy:2} */
val x = sql("select 1 xx union all select 2") 
val y = sql("select 1 yy union all select 2")

x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */
/* If I cache both tables it works: */
x.cache()
y.cache()
x.join(y, $"xx" === $"yy").count() /* expect 2, get 2 */

/* but this still doesn't work: */
x.join(y, $"xx" === $"yy").filter("yy=1").count() /* expect 1, get 0 */




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8467) Add LDAModel.describeTopics() in Python

2015-10-02 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-8467:
-
Shepherd: Joseph K. Bradley
Assignee: Yu Ishikawa

> Add LDAModel.describeTopics() in Python
> ---
>
> Key: SPARK-8467
> URL: https://issues.apache.org/jira/browse/SPARK-8467
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib, PySpark
>Reporter: Yu Ishikawa
>Assignee: Yu Ishikawa
>
> Add LDAModel. describeTopics() in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7275) Make LogicalRelation public

2015-10-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941541#comment-14941541
 ] 

Apache Spark commented on SPARK-7275:
-

User 'gweidner' has created a pull request for this issue:
https://github.com/apache/spark/pull/8965

> Make LogicalRelation public
> ---
>
> Key: SPARK-7275
> URL: https://issues.apache.org/jira/browse/SPARK-7275
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Santiago M. Mola
>Priority: Minor
>
> It seems LogicalRelation is the only part of the LogicalPlan that is not 
> public. This makes it harder to work with full logical plans from third party 
> packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10893) Lag Analytic function broken

2015-10-02 Thread Jo Desmet (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941553#comment-14941553
 ] 

Jo Desmet commented on SPARK-10893:
---

Well part of the problem is that it was not reproduced if spark-repl was used 
instead of spark-submit.

I am very unfamiliar with spark-repl and scala so I would like to avoid the 
time in learning and troubleshooting.

I will try to be more specific as to the conditions of replicating, including 
exact spark-submit parameters.

> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {code:borderStyle=solid}
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> {code}
> Java:
> {code:borderStyle=solid}
> SparkContext sc = new SparkContext(conf);
> HiveContext sqlContext = new HiveContext(sc);
> DataFrame df = sqlContext.read().json(getInputPath("input.json"));
> 
> df = df.withColumn(
>   "previous",
>   lag(dataFrame.col("VBB"), 1)
> .over(Window.orderBy(dataFrame.col("VAA")))
>   );
> {code}
> Expected Result:
> {code:borderStyle=solid}
> {"VAA":null, "VBB":null, "previous":null}
> {"VAA":"A", "VBB":1, "previous":null}
> {"VAA":"B", "VBB":-1, "previous":1}
> {"VAA":"C", "VBB":2, "previous":-1}
> {"VAA":"d", "VBB":3, "previous":2}
> {code}
> Actual Result:
> {code:borderStyle=solid}
> {"VAA":null, "VBB":null, "previous":103079215105}
> {"VAA":"A", "VBB":1, "previous":103079215105}
> {"VAA":"B", "VBB":-1, "previous":103079215105}
> {"VAA":"C", "VBB":2, "previous":103079215105}
> {"VAA":"d", "VBB":3, "previous":103079215105}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-10-02 Thread Jo Desmet (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Submitting the job:
{code:borderStyle=solid}
spark-submit \
  --master spark:\\xx:7077 \
  --deploy-mode client \
  --class package.to.DriverClass \
  --driver-java-options -Dhdp.version=2.2.0.0–2041 \
  --num-executors 2 \
  --driver-memory 2g \
  --executor-memory 2g \
  --executor-cores 2 \
  /path/to/sample-program.jar
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":103079215105}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}





  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":103079215105}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}






> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {code:borderStyle=solid}
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> {code}
> Java:
> {code:borderStyle=solid}
> SparkContext sc = new SparkContext(conf);
> HiveContext sqlContext = new HiveContext(sc);
> DataFrame df = sqlContext.read().json(getInputPath("input.json"));
> 
> df = df.withColumn(
>   "previous",
>   lag(dataFrame.col("VBB"), 1)
> .over(Window.orderBy(dataFrame.col("VAA")))
>   );
> {code}
> Submitting the job:
> {code:borderStyle=solid}
> spark-submit \
>   --master spark:\\xx:7077 \
>   --deploy-mode client \
>   --class package.to.DriverClass \
>   --driver-java-options -Dhdp

[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-10-02 Thread Jo Desmet (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Important to understand the conditions under which the job ran, I submitted to 
a standalone spark cluster in client mode as follows:
{code:borderStyle=solid}
spark-submit \
  --master spark:\\xx:7077 \
  --deploy-mode client \
  --class package.to.DriverClass \
  --driver-java-options -Dhdp.version=2.2.0.0–2041 \
  --num-executors 2 \
  --driver-memory 2g \
  --executor-memory 2g \
  --executor-cores 2 \
  /path/to/sample-program.jar
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":103079215105}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}





  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Submitting the job:
{code:borderStyle=solid}
spark-submit \
  --master spark:\\xx:7077 \
  --deploy-mode client \
  --class package.to.DriverClass \
  --driver-java-options -Dhdp.version=2.2.0.0–2041 \
  --num-executors 2 \
  --driver-memory 2g \
  --executor-memory 2g \
  --executor-cores 2 \
  /path/to/sample-program.jar
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":103079215105}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}






> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {code:borderStyle=solid}
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> {code}
> Java:
> {code:borderStyle=solid}
> SparkContext sc = new SparkContext(conf);
> HiveContext sqlContext = new

[jira] [Created] (SPARK-10915) Add support for UDAFs in Python

2015-10-02 Thread Justin Uang (JIRA)

Justin Uang created SPARK-10915:
---

 Summary: Add support for UDAFs in Python
 Key: SPARK-10915
 URL: https://issues.apache.org/jira/browse/SPARK-10915
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 0.5.0
Reporter: Justin Uang


This should support python defined lambdas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10915) Add support for UDAFs in Python

2015-10-02 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-10915:

Affects Version/s: (was: 0.5.0)

> Add support for UDAFs in Python
> ---
>
> Key: SPARK-10915
> URL: https://issues.apache.org/jira/browse/SPARK-10915
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Reporter: Justin Uang
>
> This should support python defined lambdas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10779) Set initialModel for KMeans model in PySpark (spark.mllib)

2015-10-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941635#comment-14941635
 ] 

Apache Spark commented on SPARK-10779:
--

User 'evanyc15' has created a pull request for this issue:
https://github.com/apache/spark/pull/8967

> Set initialModel for KMeans model in PySpark (spark.mllib)
> --
>
> Key: SPARK-10779
> URL: https://issues.apache.org/jira/browse/SPARK-10779
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib, PySpark
>Reporter: Joseph K. Bradley
>
> Provide initialModel param for pyspark.mllib.clustering.KMeans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10779) Set initialModel for KMeans model in PySpark (spark.mllib)

2015-10-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10779:


Assignee: (was: Apache Spark)

> Set initialModel for KMeans model in PySpark (spark.mllib)
> --
>
> Key: SPARK-10779
> URL: https://issues.apache.org/jira/browse/SPARK-10779
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib, PySpark
>Reporter: Joseph K. Bradley
>
> Provide initialModel param for pyspark.mllib.clustering.KMeans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10779) Set initialModel for KMeans model in PySpark (spark.mllib)

2015-10-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10779:


Assignee: Apache Spark

> Set initialModel for KMeans model in PySpark (spark.mllib)
> --
>
> Key: SPARK-10779
> URL: https://issues.apache.org/jira/browse/SPARK-10779
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib, PySpark
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>
> Provide initialModel param for pyspark.mllib.clustering.KMeans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-10-02 Thread Jo Desmet (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json("file:///home/app/input.json");

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Important to understand the conditions under which the job ran, I submitted to 
a standalone spark cluster in client mode as follows:
{code:borderStyle=solid}
spark-submit \
  --master spark:\\xx:7077 \
  --deploy-mode client \
  --class package.to.DriverClass \
  --driver-java-options -Dhdp.version=2.2.0.0–2041 \
  --num-executors 2 \
  --driver-memory 2g \
  --executor-memory 2g \
  --executor-cores 2 \
  /path/to/sample-program.jar
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":103079215105}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}





  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Important to understand the conditions under which the job ran, I submitted to 
a standalone spark cluster in client mode as follows:
{code:borderStyle=solid}
spark-submit \
  --master spark:\\xx:7077 \
  --deploy-mode client \
  --class package.to.DriverClass \
  --driver-java-options -Dhdp.version=2.2.0.0–2041 \
  --num-executors 2 \
  --driver-memory 2g \
  --executor-memory 2g \
  --executor-cores 2 \
  /path/to/sample-program.jar
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":103079215105}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}






> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {code:borderStyle=solid}
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> {code}

[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-10-02 Thread Jo Desmet (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid;title=/home/app/input.json}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json("file:///home/app/input.json");

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Important to understand the conditions under which the job ran, I submitted to 
a standalone spark cluster in client mode as follows:
{code:borderStyle=solid}
spark-submit \
  --master spark:\\xx:7077 \
  --deploy-mode client \
  --class package.to.DriverClass \
  --driver-java-options -Dhdp.version=2.2.0.0–2041 \
  --num-executors 2 \
  --driver-memory 2g \
  --executor-memory 2g \
  --executor-cores 2 \
  /path/to/sample-program.jar
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":103079215105}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}





  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid,title=/home/app/input.json}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json("file:///home/app/input.json");

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Important to understand the conditions under which the job ran, I submitted to 
a standalone spark cluster in client mode as follows:
{code:borderStyle=solid}
spark-submit \
  --master spark:\\xx:7077 \
  --deploy-mode client \
  --class package.to.DriverClass \
  --driver-java-options -Dhdp.version=2.2.0.0–2041 \
  --num-executors 2 \
  --driver-memory 2g \
  --executor-memory 2g \
  --executor-cores 2 \
  /path/to/sample-program.jar
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":103079215105}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}






> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {code:borderStyle=solid;title=/home/app/input.json}
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1

[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-10-02 Thread Jo Desmet (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid,title=/home/app/input.json}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json("file:///home/app/input.json");

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Important to understand the conditions under which the job ran, I submitted to 
a standalone spark cluster in client mode as follows:
{code:borderStyle=solid}
spark-submit \
  --master spark:\\xx:7077 \
  --deploy-mode client \
  --class package.to.DriverClass \
  --driver-java-options -Dhdp.version=2.2.0.0–2041 \
  --num-executors 2 \
  --driver-memory 2g \
  --executor-memory 2g \
  --executor-cores 2 \
  /path/to/sample-program.jar
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":103079215105}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}





  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json("file:///home/app/input.json");

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Important to understand the conditions under which the job ran, I submitted to 
a standalone spark cluster in client mode as follows:
{code:borderStyle=solid}
spark-submit \
  --master spark:\\xx:7077 \
  --deploy-mode client \
  --class package.to.DriverClass \
  --driver-java-options -Dhdp.version=2.2.0.0–2041 \
  --num-executors 2 \
  --driver-memory 2g \
  --executor-memory 2g \
  --executor-cores 2 \
  /path/to/sample-program.jar
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":103079215105}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}






> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {code:borderStyle=solid,title=/home/app/input.json}
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
>

[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-10-02 Thread Jo Desmet (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid|title=/home/app/input.json}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json("file:///home/app/input.json");

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Important to understand the conditions under which the job ran, I submitted to 
a standalone spark cluster in client mode as follows:
{code:borderStyle=solid}
spark-submit \
  --master spark:\\xx:7077 \
  --deploy-mode client \
  --class package.to.DriverClass \
  --driver-java-options -Dhdp.version=2.2.0.0–2041 \
  --num-executors 2 \
  --driver-memory 2g \
  --executor-memory 2g \
  --executor-cores 2 \
  /path/to/sample-program.jar
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":103079215105}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}





  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid;title=/home/app/input.json}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json("file:///home/app/input.json");

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Important to understand the conditions under which the job ran, I submitted to 
a standalone spark cluster in client mode as follows:
{code:borderStyle=solid}
spark-submit \
  --master spark:\\xx:7077 \
  --deploy-mode client \
  --class package.to.DriverClass \
  --driver-java-options -Dhdp.version=2.2.0.0–2041 \
  --num-executors 2 \
  --driver-memory 2g \
  --executor-memory 2g \
  --executor-cores 2 \
  /path/to/sample-program.jar
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":103079215105}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}






> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {code:borderStyle=solid|title=/home/app/input.json}
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1

[jira] [Commented] (SPARK-9570) Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'.

2015-10-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941701#comment-14941701
 ] 

Apache Spark commented on SPARK-9570:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/8968

> Consistent recommendation for submitting spark apps to YARN, -master yarn 
> --deploy-mode x vs -master yarn-x'.
> -
>
> Key: SPARK-9570
> URL: https://issues.apache.org/jira/browse/SPARK-9570
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Spark Submit, YARN
>Affects Versions: 1.4.1
>Reporter: Neelesh Srinivas Salian
>Priority: Minor
>  Labels: starter
>
> There are still some inconsistencies in the documentation regarding 
> submission of the applications for yarn.
> SPARK-3629 was done to correct the same but 
> http://spark.apache.org/docs/latest/submitting-applications.html#master-urls
> still has yarn-client and yarn-client as opposed to the nor of having 
> --master yarn and --deploy-mode cluster / client
> Need to change this appropriately (if needed) to avoid confusion:
> https://spark.apache.org/docs/latest/running-on-yarn.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-761) Print a nicer error message when incompatible Spark binaries try to talk

2015-10-02 Thread Harsh Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh Gupta updated SPARK-761:
--
Comment: was deleted

(was: [~aash] How do I do a compatibility check on API on which they talk ? Can 
you give a bit more specific detail on as to how to proceed . I can do it as a 
starter task to understand the core of spark functioning and that will get me 
going.)

> Print a nicer error message when incompatible Spark binaries try to talk
> 
>
> Key: SPARK-761
> URL: https://issues.apache.org/jira/browse/SPARK-761
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Matei Zaharia
>Priority: Minor
>  Labels: starter
>
> As a starter task, it would be good to audit the current behavior for 
> different client <-> server pairs with respect to how exceptions occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-761) Print a nicer error message when incompatible Spark binaries try to talk

2015-10-02 Thread Harsh Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh Gupta updated SPARK-761:
--
Comment: was deleted

(was: [~aash] What all API should be taken care of ? Which version should it 
effect ?)

> Print a nicer error message when incompatible Spark binaries try to talk
> 
>
> Key: SPARK-761
> URL: https://issues.apache.org/jira/browse/SPARK-761
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Matei Zaharia
>Priority: Minor
>  Labels: starter
>
> As a starter task, it would be good to audit the current behavior for 
> different client <-> server pairs with respect to how exceptions occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10916) YARN executors are launched with the default perm gen size

2015-10-02 Thread Marcelo Vanzin (JIRA)

Marcelo Vanzin created SPARK-10916:
--

 Summary: YARN executors are launched with the default perm gen size
 Key: SPARK-10916
 URL: https://issues.apache.org/jira/browse/SPARK-10916
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.5.1, 1.6.0
Reporter: Marcelo Vanzin


Unlike other backends, the YARN one does not explicitly set the perm gen size 
for the executor process. That means that, unless the user has explicitly 
changed it by adding extra java options, executors on YARN are running with 64m 
of perm gen (I believe) instead of 256m like the other backends.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties

2015-10-02 Thread Peng Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941775#comment-14941775
 ] 

Peng Cheng commented on SPARK-10625:


The patch can be merged immediately, can some one verify and merge it?

> Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds 
> unserializable objects into connection properties
> --
>
> Key: SPARK-10625
> URL: https://issues.apache.org/jira/browse/SPARK-10625
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
> Environment: Ubuntu 14.04
>Reporter: Peng Cheng
>  Labels: jdbc, spark, sparksql
>
> Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by 
> adding new objects into the connection properties, which is then reused by 
> Spark to be deployed to workers. When some of these new objects are unable to 
> be serializable it will trigger an org.apache.spark.SparkException: Task not 
> serializable. The following test code snippet demonstrate this problem by 
> using a modified H2 driver:
>   test("INSERT to JDBC Datasource with UnserializableH2Driver") {
> object UnserializableH2Driver extends org.h2.Driver {
>   override def connect(url: String, info: Properties): Connection = {
> val result = super.connect(url, info)
> info.put("unserializableDriver", this)
> result
>   }
>   override def getParentLogger: Logger = ???
> }
> import scala.collection.JavaConversions._
> val oldDrivers = 
> DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq
> oldDrivers.foreach{
>   DriverManager.deregisterDriver
> }
> DriverManager.registerDriver(UnserializableH2Driver)
> sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE")
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count)
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", 
> properties).collect()(0).length)
> DriverManager.deregisterDriver(UnserializableH2Driver)
> oldDrivers.foreach{
>   DriverManager.registerDriver
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10780) Set initialModel in KMeans in Pipelines API

2015-10-02 Thread Jayant Shekhar (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941781#comment-14941781
 ] 

Jayant Shekhar commented on SPARK-10780:


Almost done with it!
I would get some inputs from the team on my implementation when I have 
submitted the PR today evening.


> Set initialModel in KMeans in Pipelines API
> ---
>
> Key: SPARK-10780
> URL: https://issues.apache.org/jira/browse/SPARK-10780
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This is for the Scala version.  After this is merged, create a JIRA for 
> Python version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-10-02 Thread Miao Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941799#comment-14941799
 ] 

Miao Wang commented on SPARK-10798:
---

Hi Dev,

When I paste the following line in spark-shell, List rows= new 
Vector(); 

scala> List rows= new Vector(); 
:23: error: value < is not a member of object List
  List rows= new Vector(); 

I tried to import the Row library, it still shows the above error.

Can you provide me more information on how to recreate this issue?

Thanks!

Miao

> JsonMappingException with Spark Context Parallelize
> ---
>
> Key: SPARK-10798
> URL: https://issues.apache.org/jira/browse/SPARK-10798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: Linux, Java 1.8.45
>Reporter: Dev Lakhani
>
> When trying to create an RDD of Rows using a Java Spark Context and if I 
> serialize the rows with Kryo first, the sparkContext fails.
> byte[] data= Kryo.serialize(List)
> List fromKryoRows=Kryo.unserialize(data)
> List rows= new Vector(); //using a new set of data.
> rows.add(RowFactory.create("test"));
> javaSparkContext.parallelize(rows);
> OR
> javaSparkContext.parallelize(fromKryoRows); //using deserialized rows
> I get :
> com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
> scala.Tuple2) (through reference chain: 
> org.apache.spark.rdd.RDDOperationScope["parent"])
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
>at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
>at 
> com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
>at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>at 
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>at 
> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>at 
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
>...
> Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at scala.Option.getOrElse(Option.scala:120)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
>at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
>... 19 more
> I've tried updating jackson module scala to 2.6.1 but same issue. This 
> happens in local mode with java 1.8_45. I searched the web and this Jira for 
> similar issues but found nothing of interest.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-10-02 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941804#comment-14941804
 ] 

Sean Owen commented on SPARK-10798:
---

That's Java code, and you're pasting it in the scala shell.

> JsonMappingException with Spark Context Parallelize
> ---
>
> Key: SPARK-10798
> URL: https://issues.apache.org/jira/browse/SPARK-10798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: Linux, Java 1.8.45
>Reporter: Dev Lakhani
>
> When trying to create an RDD of Rows using a Java Spark Context and if I 
> serialize the rows with Kryo first, the sparkContext fails.
> byte[] data= Kryo.serialize(List)
> List fromKryoRows=Kryo.unserialize(data)
> List rows= new Vector(); //using a new set of data.
> rows.add(RowFactory.create("test"));
> javaSparkContext.parallelize(rows);
> OR
> javaSparkContext.parallelize(fromKryoRows); //using deserialized rows
> I get :
> com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
> scala.Tuple2) (through reference chain: 
> org.apache.spark.rdd.RDDOperationScope["parent"])
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
>at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
>at 
> com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
>at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>at 
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>at 
> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>at 
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
>...
> Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at scala.Option.getOrElse(Option.scala:120)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
>at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
>... 19 more
> I've tried updating jackson module scala to 2.6.1 but same issue. This 
> happens in local mode with java 1.8_45. I searched the web and this Jira for 
> similar issues but found nothing of interest.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10317) start-history-server.sh CLI parsing incompatible with HistoryServer's arg parsing

2015-10-02 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-10317.

   Resolution: Fixed
 Assignee: Rekha Joshi
Fix Version/s: 1.6.0

> start-history-server.sh CLI parsing incompatible with HistoryServer's arg 
> parsing
> -
>
> Key: SPARK-10317
> URL: https://issues.apache.org/jira/browse/SPARK-10317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1
>Reporter: Steve Loughran
>Assignee: Rekha Joshi
>Priority: Trivial
> Fix For: 1.6.0
>
>
> The history server has its argument parsing class in 
> {{HistoryServerArguments}}. However, this doesn't get involved in the 
> {{start-history-server.sh}} codepath where the $0 arg is assigned to  
> {{spark.history.fs.logDirectory}} and all other arguments discarded (e.g 
> {{--property-file}}.
> This stops the other options being usable from this script



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10847) Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure

2015-10-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941929#comment-14941929
 ] 

Apache Spark commented on SPARK-10847:
--

User 'jasoncl' has created a pull request for this issue:
https://github.com/apache/spark/pull/8969

> Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure
> 
>
> Key: SPARK-10847
> URL: https://issues.apache.org/jira/browse/SPARK-10847
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.5.0
> Environment: Windows 7
> java version "1.8.0_60" (64bit)
> Python 3.4.x
> Standalone cluster mode (not local[n]; a full local cluster)
>Reporter: Shea Parkes
>Priority: Minor
>
> If the optional metadata passed to `pyspark.sql.types.StructField` includes a 
> pythonic `None`, the `pyspark.SparkContext.createDataFrame` will fail with a 
> very cryptic/unhelpful error.
> Here is a minimal reproducible example:
> {code:none}
> # Assumes sc exists
> import pyspark.sql.types as types
> sqlContext = SQLContext(sc)
> literal_metadata = types.StructType([
> types.StructField(
> 'name',
> types.StringType(),
> nullable=True,
> metadata={'comment': 'From accounting system.'}
> ),
> types.StructField(
> 'age',
> types.IntegerType(),
> nullable=True,
> metadata={'comment': None}
> ),
> ])
> literal_rdd = sc.parallelize([
> ['Bob', 34],
> ['Dan', 42],
> ])
> print(literal_rdd.take(2))
> failed_dataframe = sqlContext.createDataFrame(
> literal_rdd,
> literal_metadata,
> )
> {code}
> This produces the following ~stacktrace:
> {noformat}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "", line 28, in 
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\pyspark\sql\context.py",
>  line 408, in createDataFrame
> jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py",
>  line 538, in __call__
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\pyspark\sql\utils.py",
>  line 36, in deco
> return f(*a, **kw)
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o757.applySchemaToPythonRDD.
> : java.lang.RuntimeException: Do not support type class scala.Tuple2.
>   at 
> org.apache.spark.sql.types.Metadata$$anonfun$fromJObject$1.apply(Metadata.scala:160)
>   at 
> org.apache.spark.sql.types.Metadata$$anonfun$fromJObject$1.apply(Metadata.scala:127)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.apache.spark.sql.types.Metadata$.fromJObject(Metadata.scala:127)
>   at 
> org.apache.spark.sql.types.DataType$.org$apache$spark$sql$types$DataType$$parseStructField(DataType.scala:173)
>   at 
> org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:148)
>   at 
> org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:148)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.types.DataType$.parseDataType(DataType.scala:148)
>   at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:96)
>   at org.apache.spark.sql.SQLContext.parseDataType(SQLContext.scala:961)
>   at 
> org.apache.spark.sql.SQLContext.applySchemaToPythonRDD(SQLContext.scala:970)
>   at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Unknown Source)
> {noformat}
> I believe the most important line of the traceback is this one:
> {noformat}
> py4j

[jira] [Assigned] (SPARK-10847) Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure

2015-10-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10847:


Assignee: (was: Apache Spark)

> Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure
> 
>
> Key: SPARK-10847
> URL: https://issues.apache.org/jira/browse/SPARK-10847
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.5.0
> Environment: Windows 7
> java version "1.8.0_60" (64bit)
> Python 3.4.x
> Standalone cluster mode (not local[n]; a full local cluster)
>Reporter: Shea Parkes
>Priority: Minor
>
> If the optional metadata passed to `pyspark.sql.types.StructField` includes a 
> pythonic `None`, the `pyspark.SparkContext.createDataFrame` will fail with a 
> very cryptic/unhelpful error.
> Here is a minimal reproducible example:
> {code:none}
> # Assumes sc exists
> import pyspark.sql.types as types
> sqlContext = SQLContext(sc)
> literal_metadata = types.StructType([
> types.StructField(
> 'name',
> types.StringType(),
> nullable=True,
> metadata={'comment': 'From accounting system.'}
> ),
> types.StructField(
> 'age',
> types.IntegerType(),
> nullable=True,
> metadata={'comment': None}
> ),
> ])
> literal_rdd = sc.parallelize([
> ['Bob', 34],
> ['Dan', 42],
> ])
> print(literal_rdd.take(2))
> failed_dataframe = sqlContext.createDataFrame(
> literal_rdd,
> literal_metadata,
> )
> {code}
> This produces the following ~stacktrace:
> {noformat}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "", line 28, in 
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\pyspark\sql\context.py",
>  line 408, in createDataFrame
> jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py",
>  line 538, in __call__
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\pyspark\sql\utils.py",
>  line 36, in deco
> return f(*a, **kw)
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o757.applySchemaToPythonRDD.
> : java.lang.RuntimeException: Do not support type class scala.Tuple2.
>   at 
> org.apache.spark.sql.types.Metadata$$anonfun$fromJObject$1.apply(Metadata.scala:160)
>   at 
> org.apache.spark.sql.types.Metadata$$anonfun$fromJObject$1.apply(Metadata.scala:127)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.apache.spark.sql.types.Metadata$.fromJObject(Metadata.scala:127)
>   at 
> org.apache.spark.sql.types.DataType$.org$apache$spark$sql$types$DataType$$parseStructField(DataType.scala:173)
>   at 
> org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:148)
>   at 
> org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:148)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.types.DataType$.parseDataType(DataType.scala:148)
>   at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:96)
>   at org.apache.spark.sql.SQLContext.parseDataType(SQLContext.scala:961)
>   at 
> org.apache.spark.sql.SQLContext.applySchemaToPythonRDD(SQLContext.scala:970)
>   at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Unknown Source)
> {noformat}
> I believe the most important line of the traceback is this one:
> {noformat}
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o757.applySchemaToPythonRDD.
> : java.lang.RuntimeExc

[jira] [Assigned] (SPARK-10847) Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure

2015-10-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10847:


Assignee: Apache Spark

> Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure
> 
>
> Key: SPARK-10847
> URL: https://issues.apache.org/jira/browse/SPARK-10847
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.5.0
> Environment: Windows 7
> java version "1.8.0_60" (64bit)
> Python 3.4.x
> Standalone cluster mode (not local[n]; a full local cluster)
>Reporter: Shea Parkes
>Assignee: Apache Spark
>Priority: Minor
>
> If the optional metadata passed to `pyspark.sql.types.StructField` includes a 
> pythonic `None`, the `pyspark.SparkContext.createDataFrame` will fail with a 
> very cryptic/unhelpful error.
> Here is a minimal reproducible example:
> {code:none}
> # Assumes sc exists
> import pyspark.sql.types as types
> sqlContext = SQLContext(sc)
> literal_metadata = types.StructType([
> types.StructField(
> 'name',
> types.StringType(),
> nullable=True,
> metadata={'comment': 'From accounting system.'}
> ),
> types.StructField(
> 'age',
> types.IntegerType(),
> nullable=True,
> metadata={'comment': None}
> ),
> ])
> literal_rdd = sc.parallelize([
> ['Bob', 34],
> ['Dan', 42],
> ])
> print(literal_rdd.take(2))
> failed_dataframe = sqlContext.createDataFrame(
> literal_rdd,
> literal_metadata,
> )
> {code}
> This produces the following ~stacktrace:
> {noformat}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "", line 28, in 
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\pyspark\sql\context.py",
>  line 408, in createDataFrame
> jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py",
>  line 538, in __call__
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\pyspark\sql\utils.py",
>  line 36, in deco
> return f(*a, **kw)
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o757.applySchemaToPythonRDD.
> : java.lang.RuntimeException: Do not support type class scala.Tuple2.
>   at 
> org.apache.spark.sql.types.Metadata$$anonfun$fromJObject$1.apply(Metadata.scala:160)
>   at 
> org.apache.spark.sql.types.Metadata$$anonfun$fromJObject$1.apply(Metadata.scala:127)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.apache.spark.sql.types.Metadata$.fromJObject(Metadata.scala:127)
>   at 
> org.apache.spark.sql.types.DataType$.org$apache$spark$sql$types$DataType$$parseStructField(DataType.scala:173)
>   at 
> org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:148)
>   at 
> org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:148)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.types.DataType$.parseDataType(DataType.scala:148)
>   at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:96)
>   at org.apache.spark.sql.SQLContext.parseDataType(SQLContext.scala:961)
>   at 
> org.apache.spark.sql.SQLContext.applySchemaToPythonRDD(SQLContext.scala:970)
>   at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Unknown Source)
> {noformat}
> I believe the most important line of the traceback is this one:
> {noformat}
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o757.applySchemaToPythonRDD.

[jira] [Commented] (SPARK-10847) Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure

2015-10-02 Thread Shea Parkes (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941952#comment-14941952
 ] 

Shea Parkes commented on SPARK-10847:
-

I appreciate your assistance!  I think your proposal is an improvement, but I 
think it would be better if the failure was triggered upon the creation of the 
StructType object - that's where the error actually occurred.

The distance between the definition of the metadata and the import was much 
larger in my project; I think your new error message would still have me 
looking for NULL values in my data (instead of my metadata).  That's likely a 
part of my unfamiliarity of Scala, but I chased as far down the pyspark code as 
I could go and didn't figure it out without trial and error.

I realize this might mean traversing an arbitrary dictionary in the StructType 
initialization looking for unallowed types, which might be unacceptable.  It 
would still be much more in line with "Crash Early, Crash Often" philosophy if 
it were possible to bomb at the creation of the metadata.

Thanks again for the assistance!

> Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure
> 
>
> Key: SPARK-10847
> URL: https://issues.apache.org/jira/browse/SPARK-10847
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.5.0
> Environment: Windows 7
> java version "1.8.0_60" (64bit)
> Python 3.4.x
> Standalone cluster mode (not local[n]; a full local cluster)
>Reporter: Shea Parkes
>Priority: Minor
>
> If the optional metadata passed to `pyspark.sql.types.StructField` includes a 
> pythonic `None`, the `pyspark.SparkContext.createDataFrame` will fail with a 
> very cryptic/unhelpful error.
> Here is a minimal reproducible example:
> {code:none}
> # Assumes sc exists
> import pyspark.sql.types as types
> sqlContext = SQLContext(sc)
> literal_metadata = types.StructType([
> types.StructField(
> 'name',
> types.StringType(),
> nullable=True,
> metadata={'comment': 'From accounting system.'}
> ),
> types.StructField(
> 'age',
> types.IntegerType(),
> nullable=True,
> metadata={'comment': None}
> ),
> ])
> literal_rdd = sc.parallelize([
> ['Bob', 34],
> ['Dan', 42],
> ])
> print(literal_rdd.take(2))
> failed_dataframe = sqlContext.createDataFrame(
> literal_rdd,
> literal_metadata,
> )
> {code}
> This produces the following ~stacktrace:
> {noformat}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "", line 28, in 
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\pyspark\sql\context.py",
>  line 408, in createDataFrame
> jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py",
>  line 538, in __call__
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\pyspark\sql\utils.py",
>  line 36, in deco
> return f(*a, **kw)
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o757.applySchemaToPythonRDD.
> : java.lang.RuntimeException: Do not support type class scala.Tuple2.
>   at 
> org.apache.spark.sql.types.Metadata$$anonfun$fromJObject$1.apply(Metadata.scala:160)
>   at 
> org.apache.spark.sql.types.Metadata$$anonfun$fromJObject$1.apply(Metadata.scala:127)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.apache.spark.sql.types.Metadata$.fromJObject(Metadata.scala:127)
>   at 
> org.apache.spark.sql.types.DataType$.org$apache$spark$sql$types$DataType$$parseStructField(DataType.scala:173)
>   at 
> org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:148)
>   at 
> org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:148)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.types.DataType$.parseDataType(DataType.scala:148)
>   at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:96)
>   at org.apache.spark.sql.SQLContext.parseDataType(SQLContext.scala:961)
>   at 
> org.apache.spark.

[jira] [Commented] (SPARK-10847) Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure

2015-10-02 Thread Shea Parkes (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941953#comment-14941953
 ] 

Shea Parkes commented on SPARK-10847:
-

My apologies, I just read your patch and see you made it work even with 
Pythonic Nulls.  You rule sir; thanks a bunch.

> Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure
> 
>
> Key: SPARK-10847
> URL: https://issues.apache.org/jira/browse/SPARK-10847
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.5.0
> Environment: Windows 7
> java version "1.8.0_60" (64bit)
> Python 3.4.x
> Standalone cluster mode (not local[n]; a full local cluster)
>Reporter: Shea Parkes
>Priority: Minor
>
> If the optional metadata passed to `pyspark.sql.types.StructField` includes a 
> pythonic `None`, the `pyspark.SparkContext.createDataFrame` will fail with a 
> very cryptic/unhelpful error.
> Here is a minimal reproducible example:
> {code:none}
> # Assumes sc exists
> import pyspark.sql.types as types
> sqlContext = SQLContext(sc)
> literal_metadata = types.StructType([
> types.StructField(
> 'name',
> types.StringType(),
> nullable=True,
> metadata={'comment': 'From accounting system.'}
> ),
> types.StructField(
> 'age',
> types.IntegerType(),
> nullable=True,
> metadata={'comment': None}
> ),
> ])
> literal_rdd = sc.parallelize([
> ['Bob', 34],
> ['Dan', 42],
> ])
> print(literal_rdd.take(2))
> failed_dataframe = sqlContext.createDataFrame(
> literal_rdd,
> literal_metadata,
> )
> {code}
> This produces the following ~stacktrace:
> {noformat}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "", line 28, in 
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\pyspark\sql\context.py",
>  line 408, in createDataFrame
> jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py",
>  line 538, in __call__
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\pyspark\sql\utils.py",
>  line 36, in deco
> return f(*a, **kw)
>   File 
> "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o757.applySchemaToPythonRDD.
> : java.lang.RuntimeException: Do not support type class scala.Tuple2.
>   at 
> org.apache.spark.sql.types.Metadata$$anonfun$fromJObject$1.apply(Metadata.scala:160)
>   at 
> org.apache.spark.sql.types.Metadata$$anonfun$fromJObject$1.apply(Metadata.scala:127)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.apache.spark.sql.types.Metadata$.fromJObject(Metadata.scala:127)
>   at 
> org.apache.spark.sql.types.DataType$.org$apache$spark$sql$types$DataType$$parseStructField(DataType.scala:173)
>   at 
> org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:148)
>   at 
> org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:148)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.types.DataType$.parseDataType(DataType.scala:148)
>   at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:96)
>   at org.apache.spark.sql.SQLContext.parseDataType(SQLContext.scala:961)
>   at 
> org.apache.spark.sql.SQLContext.applySchemaToPythonRDD(SQLContext.scala:970)
>   at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Unknown Source)
> {noformat}
> I believe the most important line of the traceback is this one:
>

[jira] [Assigned] (SPARK-10916) YARN executors are launched with the default perm gen size

2015-10-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10916:


Assignee: Apache Spark

> YARN executors are launched with the default perm gen size
> --
>
> Key: SPARK-10916
> URL: https://issues.apache.org/jira/browse/SPARK-10916
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
>
> Unlike other backends, the YARN one does not explicitly set the perm gen size 
> for the executor process. That means that, unless the user has explicitly 
> changed it by adding extra java options, executors on YARN are running with 
> 64m of perm gen (I believe) instead of 256m like the other backends.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10916) YARN executors are launched with the default perm gen size

2015-10-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941958#comment-14941958
 ] 

Apache Spark commented on SPARK-10916:
--

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/8970

> YARN executors are launched with the default perm gen size
> --
>
> Key: SPARK-10916
> URL: https://issues.apache.org/jira/browse/SPARK-10916
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Marcelo Vanzin
>
> Unlike other backends, the YARN one does not explicitly set the perm gen size 
> for the executor process. That means that, unless the user has explicitly 
> changed it by adding extra java options, executors on YARN are running with 
> 64m of perm gen (I believe) instead of 256m like the other backends.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10916) YARN executors are launched with the default perm gen size

2015-10-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10916:


Assignee: (was: Apache Spark)

> YARN executors are launched with the default perm gen size
> --
>
> Key: SPARK-10916
> URL: https://issues.apache.org/jira/browse/SPARK-10916
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Marcelo Vanzin
>
> Unlike other backends, the YARN one does not explicitly set the perm gen size 
> for the executor process. That means that, unless the user has explicitly 
> changed it by adding extra java options, executors on YARN are running with 
> 64m of perm gen (I believe) instead of 256m like the other backends.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2015-10-02 Thread Evan Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941961#comment-14941961
 ] 

Evan Chen commented on SPARK-9487:
--

Hey Xiangrui,

What would be the preferred num. worker threads? Should we set all of them to 
local[2] to stay consistent with the Scala/Java side?

Thanks

> Use the same num. worker threads in Scala/Python unit tests
> ---
>
> Key: SPARK-9487
> URL: https://issues.apache.org/jira/browse/SPARK-9487
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core, SQL, Tests
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>
> In Python we use `local[4]` for unit tests, while in Scala/Java we use 
> `local[2]` and `local` for some unit tests in SQL, MLLib, and other 
> components. If the operation depends on partition IDs, e.g., random number 
> generator, this will lead to different result in Python and Scala/Java. It 
> would be nice to use the same number in all unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10917) Improve performance of complex types in columnar cache

2015-10-02 Thread Davies Liu (JIRA)

Davies Liu created SPARK-10917:
--

 Summary: Improve performance of complex types in columnar cache
 Key: SPARK-10917
 URL: https://issues.apache.org/jira/browse/SPARK-10917
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Davies Liu
Assignee: Davies Liu


Complex types are really really slow in columnar cache, because of kryo 
serializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10917) Improve performance of complex types in columnar cache

2015-10-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942014#comment-14942014
 ] 

Apache Spark commented on SPARK-10917:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/8971

> Improve performance of complex types in columnar cache
> --
>
> Key: SPARK-10917
> URL: https://issues.apache.org/jira/browse/SPARK-10917
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Complex types are really really slow in columnar cache, because of kryo 
> serializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10917) Improve performance of complex types in columnar cache

2015-10-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10917:


Assignee: Davies Liu  (was: Apache Spark)

> Improve performance of complex types in columnar cache
> --
>
> Key: SPARK-10917
> URL: https://issues.apache.org/jira/browse/SPARK-10917
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Complex types are really really slow in columnar cache, because of kryo 
> serializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10917) Improve performance of complex types in columnar cache

2015-10-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10917:


Assignee: Apache Spark  (was: Davies Liu)

> Improve performance of complex types in columnar cache
> --
>
> Key: SPARK-10917
> URL: https://issues.apache.org/jira/browse/SPARK-10917
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Apache Spark
>
> Complex types are really really slow in columnar cache, because of kryo 
> serializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-10-02 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942096#comment-14942096
 ] 

Naden Franciscus commented on SPARK-10474:
--

I have tried setting spark.buffer.pageSize to both 1Mb and 64MB and it makes no 
difference.

It also tries to acquire 33554432 bytes of memory in both cases.

Can we please reopen this ticket ?

> TungstenAggregation cannot acquire memory for pointer array after switching 
> to sort-based
> -
>
> Key: SPARK-10474
> URL: https://issues.apache.org/jira/browse/SPARK-10474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> In aggregation case, a  Lost task happened with below error.
> {code}
>  java.io.IOException: Could not acquire 65536 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220)
> at 
> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:126)
> at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:379)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Key SQL Query
> {code:sql}
> INSERT INTO TABLE test_table
> SELECT
>   ss.ss_customer_sk AS cid,
>   count(CASE WHEN i.i_class_id=1  THEN 1 ELSE NULL END) AS id1,
>   count(CASE WHEN i.i_class_id=3  THEN 1 ELSE NULL END) AS id3,
>   count(CASE WHEN i.i_class_id=5  THEN 1 ELSE NULL END) AS id5,
>   count(CASE WHEN i.i_class_id=7  THEN 1 ELSE NULL END) AS id7,
>   count(CASE WHEN i.i_class_id=9  THEN 1 ELSE NULL END) AS id9,
>   count(CASE WHEN i.i_class_id=11 THEN 1 ELSE NULL END) AS id11,
>   count(CASE WHEN i.i_class_id=13 THEN 1 ELSE NULL END) AS id13,
>   count(CASE WHEN i.i_class_id=15 THEN 1 ELSE NULL END) AS id15,
>   count(CASE WHEN i.i_class_id=2  THEN 1 ELSE NULL END) AS id2,
>   count(CASE WHEN i.i_class_id=4  THEN 1 ELSE NULL END) AS id4,
>   count(CASE WHEN i.i_class_id=6  THEN 1 ELSE NULL END) AS id6,
>   count(CASE WHEN i.i_class_id=8  THEN 1 ELSE NULL END) AS id8,
>   count(CASE WHEN i.i_class_id=10 THEN 1 ELSE NULL END) AS id10,
>   count(CASE WHEN i.i_class_id=14 THEN 1 ELSE NULL END) AS id14,
>   count(CASE WHEN i.i_class_id=16 THEN 1 ELSE NULL END) AS id16
> FROM store_sales ss
> INNER JOIN item i ON ss.ss_item_sk = i.i_item_sk
> WHERE i.i_category IN ('Books')
> AND ss.ss_customer_sk IS NOT NULL
> GROUP BY ss.ss_customer_sk
> HAVING count(ss.ss_item_sk) > 5
> {code}
> Note:
> the store_sales is a big fact table and item

[jira] [Commented] (SPARK-6270) Standalone Master hangs when streaming job completes and event logging is enabled

2015-10-02 Thread Jim Haughwout (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942108#comment-14942108
 ] 

Jim Haughwout commented on SPARK-6270:
--

[~tdas]: Was this fixed in 1.5.0? Affecting our team as well. Turning what 
should be zero-impact stop and re-submits into long outages while we wait for 
Master to settle

> Standalone Master hangs when streaming job completes and event logging is 
> enabled
> -
>
> Key: SPARK-6270
> URL: https://issues.apache.org/jira/browse/SPARK-6270
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Streaming
>Affects Versions: 1.2.0, 1.2.1, 1.3.0
>Reporter: Tathagata Das
>Priority: Critical
>
> If the event logging is enabled, the Spark Standalone Master tries to 
> recreate the web UI of a completed Spark application from its event logs. 
> However if this event log is huge (e.g. for a Spark Streaming application), 
> then the master hangs in its attempt to read and recreate the web ui. This 
> hang causes the whole standalone cluster to be unusable. 
> Workaround is to disable the event logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

98 matches

Mail list logo