date:20170404

[jira] [Resolved] (SPARK-20176) Spark Dataframe UDAF issue

2017-04-04 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20176.
---
Resolution: Cannot Reproduce

> Spark Dataframe UDAF issue
> --
>
> Key: SPARK-20176
> URL: https://issues.apache.org/jira/browse/SPARK-20176
> Project: Spark
>  Issue Type: IT Help
>  Components: Spark Core
>Affects Versions: 2.0.2
>Reporter: Dinesh Man Amatya
>
> Getting following error in custom UDAF
> Error while decoding: java.util.concurrent.ExecutionException: 
> java.lang.Exception: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 58, Column 33: Incompatible expression types "boolean" and "java.lang.Boolean"
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificSafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificSafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private MutableRow mutableRow;
> /* 009 */   private Object[] values;
> /* 010 */   private Object[] values1;
> /* 011 */   private org.apache.spark.sql.types.StructType schema;
> /* 012 */   private org.apache.spark.sql.types.StructType schema1;
> /* 013 */
> /* 014 */
> /* 015 */   public SpecificSafeProjection(Object[] references) {
> /* 016 */ this.references = references;
> /* 017 */ mutableRow = (MutableRow) references[references.length - 1];
> /* 018 */
> /* 019 */
> /* 020 */ this.schema = (org.apache.spark.sql.types.StructType) 
> references[0];
> /* 021 */ this.schema1 = (org.apache.spark.sql.types.StructType) 
> references[1];
> /* 022 */   }
> /* 023 */
> /* 024 */   public java.lang.Object apply(java.lang.Object _i) {
> /* 025 */ InternalRow i = (InternalRow) _i;
> /* 026 */
> /* 027 */ values = new Object[2];
> /* 028 */
> /* 029 */ boolean isNull2 = i.isNullAt(0);
> /* 030 */ UTF8String value2 = isNull2 ? null : (i.getUTF8String(0));
> /* 031 */
> /* 032 */ boolean isNull1 = isNull2;
> /* 033 */ final java.lang.String value1 = isNull1 ? null : 
> (java.lang.String) value2.toString();
> /* 034 */ isNull1 = value1 == null;
> /* 035 */ if (isNull1) {
> /* 036 */   values[0] = null;
> /* 037 */ } else {
> /* 038 */   values[0] = value1;
> /* 039 */ }
> /* 040 */
> /* 041 */ boolean isNull5 = i.isNullAt(1);
> /* 042 */ InternalRow value5 = isNull5 ? null : (i.getStruct(1, 2));
> /* 043 */ boolean isNull3 = false;
> /* 044 */ org.apache.spark.sql.Row value3 = null;
> /* 045 */ if (!false && isNull5) {
> /* 046 */
> /* 047 */   final org.apache.spark.sql.Row value6 = null;
> /* 048 */   isNull3 = true;
> /* 049 */   value3 = value6;
> /* 050 */ } else {
> /* 051 */
> /* 052 */   values1 = new Object[2];
> /* 053 */
> /* 054 */   boolean isNull10 = i.isNullAt(1);
> /* 055 */   InternalRow value10 = isNull10 ? null : (i.getStruct(1, 2));
> /* 056 */
> /* 057 */   boolean isNull9 = isNull10 || false;
> /* 058 */   final boolean value9 = isNull9 ? false : (Boolean) 
> value10.isNullAt(0);
> /* 059 */   boolean isNull8 = false;
> /* 060 */   double value8 = -1.0;
> /* 061 */   if (!isNull9 && value9) {
> /* 062 */
> /* 063 */ final double value12 = -1.0;
> /* 064 */ isNull8 = true;
> /* 065 */ value8 = value12;
> /* 066 */   } else {
> /* 067 */
> /* 068 */ boolean isNull14 = i.isNullAt(1);
> /* 069 */ InternalRow value14 = isNull14 ? null : (i.getStruct(1, 2));
> /* 070 */ boolean isNull13 = isNull14;
> /* 071 */ double value13 = -1.0;
> /* 072 */
> /* 073 */ if (!isNull14) {
> /* 074 */
> /* 075 */   if (value14.isNullAt(0)) {
> /* 076 */ isNull13 = true;
> /* 077 */   } else {
> /* 078 */ value13 = value14.getDouble(0);
> /* 079 */   }
> /* 080 */
> /* 081 */ }
> /* 082 */ isNull8 = isNull13;
> /* 083 */ value8 = value13;
> /* 084 */   }
> /* 085 */   if (isNull8) {
> /* 086 */ values1[0] = null;
> /* 087 */   } else {
> /* 088 */ values1[0] = value8;
> /* 089 */   }
> /* 090 */
> /* 091 */   boolean isNull17 = i.isNullAt(1);
> /* 092 */   InternalRow value17 = isNull17 ? null : (i.getStruct(1, 2));
> /* 093 */
> /* 094 */   boolean isNull16 = isNull17 || false;
> /* 095 */   final boolean value16 = isNull16 ? false : (Boolean) 
> value17.isNullAt(1);
> /* 096 */   boolean isNull15 = false;
> /* 097 */   double value15 = -1.0;
> /* 098 */   if (!isNull16 && value16) {
> /* 099 */
> /* 100 */ final double value19 = -1.0;
> /* 101 */ isNull15 = true;
> /* 102 */

[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener

2017-04-04 Thread Daan Van den Nest (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954698#comment-15954698
 ] 

Daan Van den Nest commented on SPARK-17742:
---

Hi, is there already a fix for this issue? Because it effectively renders the 
SparkLauncher unusable.. Or does anybody know at least a workaround for it? Any 
thoughts [~anshbansal], [~vanzin] ? Thanks!

> Spark Launcher does not get failed state in Listener 
> -
>
> Key: SPARK-17742
> URL: https://issues.apache.org/jira/browse/SPARK-17742
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I tried to launch an application using the below code. This is dummy code to 
> reproduce the problem. I tried exiting spark with status -1, throwing an 
> exception etc. but in no case did the listener give me failed status. But if 
> a spark job returns -1 or throws an exception from the main method it should 
> be considered as a failure. 
> {code}
> package com.example;
> import org.apache.spark.launcher.SparkAppHandle;
> import org.apache.spark.launcher.SparkLauncher;
> import java.io.IOException;
> public class Main2 {
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> SparkLauncher launcher = new SparkLauncher()
> .setSparkHome("/opt/spark2")
> 
> .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar")
> .setMainClass("com.example.Main")
> .setMaster("local[2]");
> launcher.startApplication(new MyListener());
> Thread.sleep(1000 * 60);
> }
> }
> class MyListener implements SparkAppHandle.Listener {
> @Override
> public void stateChanged(SparkAppHandle handle) {
> System.out.println("state changed " + handle.getState());
> }
> @Override
> public void infoChanged(SparkAppHandle handle) {
> System.out.println("info changed " + handle.getState());
> }
> }
> {code}
> The spark job is 
> {code}
> package com.example;
> import org.apache.spark.sql.SparkSession;
> import java.io.IOException;
> public class Main {
> public static void main(String[] args) throws IOException {
> SparkSession sparkSession = SparkSession
> .builder()
> .appName("" + System.currentTimeMillis())
> .getOrCreate();
> try {
> for (int i = 0; i < 15; i++) {
> Thread.sleep(1000);
> System.out.println("sleeping 1");
> }
> } catch (InterruptedException e) {
> e.printStackTrace();
> }
> //sparkSession.stop();
> System.exit(-1);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener

2017-04-04 Thread Aseem Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954714#comment-15954714
 ] 

Aseem Bansal commented on SPARK-17742:
--

[~daanvdn] We ended up using kafka messages to communicate to the web app that 
was using the launcher to launch the job  whether the job was complete or 
failed. Dumped Launcher's states as they are broken.

> Spark Launcher does not get failed state in Listener 
> -
>
> Key: SPARK-17742
> URL: https://issues.apache.org/jira/browse/SPARK-17742
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I tried to launch an application using the below code. This is dummy code to 
> reproduce the problem. I tried exiting spark with status -1, throwing an 
> exception etc. but in no case did the listener give me failed status. But if 
> a spark job returns -1 or throws an exception from the main method it should 
> be considered as a failure. 
> {code}
> package com.example;
> import org.apache.spark.launcher.SparkAppHandle;
> import org.apache.spark.launcher.SparkLauncher;
> import java.io.IOException;
> public class Main2 {
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> SparkLauncher launcher = new SparkLauncher()
> .setSparkHome("/opt/spark2")
> 
> .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar")
> .setMainClass("com.example.Main")
> .setMaster("local[2]");
> launcher.startApplication(new MyListener());
> Thread.sleep(1000 * 60);
> }
> }
> class MyListener implements SparkAppHandle.Listener {
> @Override
> public void stateChanged(SparkAppHandle handle) {
> System.out.println("state changed " + handle.getState());
> }
> @Override
> public void infoChanged(SparkAppHandle handle) {
> System.out.println("info changed " + handle.getState());
> }
> }
> {code}
> The spark job is 
> {code}
> package com.example;
> import org.apache.spark.sql.SparkSession;
> import java.io.IOException;
> public class Main {
> public static void main(String[] args) throws IOException {
> SparkSession sparkSession = SparkSession
> .builder()
> .appName("" + System.currentTimeMillis())
> .getOrCreate();
> try {
> for (int i = 0; i < 15; i++) {
> Thread.sleep(1000);
> System.out.println("sleeping 1");
> }
> } catch (InterruptedException e) {
> e.printStackTrace();
> }
> //sparkSession.stop();
> System.exit(-1);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20199) GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter

2017-04-04 Thread pralabhkumar (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954796#comment-15954796
 ] 

pralabhkumar commented on SPARK-20199:
--

Hi

 GBM is internally using Random Forest 

GradientBoostedTrees have method boost which calls DescisionTreeRegressor Train 
method to build the trees. 

private[ml] def train(data: RDD[LabeledPoint],
  oldStrategy: OldStrategy): DecisionTreeRegressionModel = {
val instr = Instrumentation.create(this, data)
instr.logParams(params: _*)

val trees = RandomForest.run(data, oldStrategy, numTrees = 1, 
featureSubsetStrategy = "all",
  seed = $(seed), instr = Some(instr), parentUID = Some(uid))

val m = trees.head.asInstanceOf[DecisionTreeRegressionModel]
instr.logSuccess(m)
m
  }

Here the featureSubsetStrategy is hardcoded to "all" , is there any specific 
reason to do that .  Shouldn't the property expose to user to chose the 
featureSubsetStrategy from "auto", "all" ,"sqrt" , "log2" , "onethird"  . 
 

> GradientBoostedTreesModel doesn't have  Column Sampling Rate Paramenter
> ---
>
> Key: SPARK-20199
> URL: https://issues.apache.org/jira/browse/SPARK-20199
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.1.0
>Reporter: pralabhkumar
>Priority: Minor
>
> Spark GradientBoostedTreesModel doesn't have Column  sampling rate parameter 
> . This parameter is available in H2O and XGBoost. 
> Sample from H2O.ai 
> gbmParams._col_sample_rate
> Please provide the parameter . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20155) CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote

2017-04-04 Thread Rick Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rick Moritz updated SPARK-20155:

Summary: CSV-files with quoted quotes can't be parsed, if delimiter follows 
quoted quote  (was: CSV-files with quoted quotes can't be parsed, if delimiter 
followes quoted quote)

> CSV-files with quoted quotes can't be parsed, if delimiter follows quoted 
> quote
> ---
>
> Key: SPARK-20155
> URL: https://issues.apache.org/jira/browse/SPARK-20155
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.0.0
>Reporter: Rick Moritz
>
> According to :
> https://tools.ietf.org/html/rfc4180#section-2
> 7.  If double-quotes are used to enclose fields, then a double-quote
>appearing inside a field must be escaped by preceding it with
>another double quote.  For example:
>"aaa","b""bb","ccc"
> This currently works as is, but the following does not:
>  "aaa","b""b,b","ccc"
> while  "aaa","b\"b,b","ccc" does get parsed.
> I assume, this happens because quotes are currently being parsed in pairs, 
> and that somehow ends up unquoting delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20156) Java String toLowerCase "Turkish locale bug" causes Spark problems

2017-04-04 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-20156:
-

Assignee: Sean Owen
 Summary: Java String toLowerCase "Turkish locale bug" causes Spark 
problems  (was: Local dependent library used for upper and lowercase 
conversions.)

I retitled this; please refer to things like 
http://mattryall.net/blog/2009/02/the-infamous-turkish-locale-bug for 
back-story on this particular issue. 

I believe the best change is to make all case-changing operations use 
Locale.ROOT.

> Java String toLowerCase "Turkish locale bug" causes Spark problems
> --
>
> Key: SPARK-20156
> URL: https://issues.apache.org/jira/browse/SPARK-20156
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.1.0
> Environment: Ubunutu 16.04
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
>Reporter: Serkan Taş
>Assignee: Sean Owen
> Attachments: sprk_shell.txt
>
>
> If the regional setting of the operation system is Turkish, the famous java 
> locale problem occurs (https://jira.atlassian.com/browse/CONF-5931 or 
> https://issues.apache.org/jira/browse/AVRO-1493). 
> e.g : 
> "SERDEINFO" lowers to "serdeınfo"
> "uniquetable" uppers to "UNİQUETABLE"
> work around : 
> add -Duser.country=US -Duser.language=en to the end of the line 
> SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"
> in spark-shell.sh



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20156) Java String toLowerCase "Turkish locale bug" causes Spark problems

2017-04-04 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954850#comment-15954850
 ] 

Apache Spark commented on SPARK-20156:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/17527

> Java String toLowerCase "Turkish locale bug" causes Spark problems
> --
>
> Key: SPARK-20156
> URL: https://issues.apache.org/jira/browse/SPARK-20156
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.1.0
> Environment: Ubunutu 16.04
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
>Reporter: Serkan Taş
>Assignee: Sean Owen
> Attachments: sprk_shell.txt
>
>
> If the regional setting of the operation system is Turkish, the famous java 
> locale problem occurs (https://jira.atlassian.com/browse/CONF-5931 or 
> https://issues.apache.org/jira/browse/AVRO-1493). 
> e.g : 
> "SERDEINFO" lowers to "serdeınfo"
> "uniquetable" uppers to "UNİQUETABLE"
> work around : 
> add -Duser.country=US -Duser.language=en to the end of the line 
> SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"
> in spark-shell.sh



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20156) Java String toLowerCase "Turkish locale bug" causes Spark problems

2017-04-04 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20156:


Assignee: Apache Spark  (was: Sean Owen)

> Java String toLowerCase "Turkish locale bug" causes Spark problems
> --
>
> Key: SPARK-20156
> URL: https://issues.apache.org/jira/browse/SPARK-20156
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.1.0
> Environment: Ubunutu 16.04
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
>Reporter: Serkan Taş
>Assignee: Apache Spark
> Attachments: sprk_shell.txt
>
>
> If the regional setting of the operation system is Turkish, the famous java 
> locale problem occurs (https://jira.atlassian.com/browse/CONF-5931 or 
> https://issues.apache.org/jira/browse/AVRO-1493). 
> e.g : 
> "SERDEINFO" lowers to "serdeınfo"
> "uniquetable" uppers to "UNİQUETABLE"
> work around : 
> add -Duser.country=US -Duser.language=en to the end of the line 
> SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"
> in spark-shell.sh



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20156) Java String toLowerCase "Turkish locale bug" causes Spark problems

2017-04-04 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20156:


Assignee: Sean Owen  (was: Apache Spark)

> Java String toLowerCase "Turkish locale bug" causes Spark problems
> --
>
> Key: SPARK-20156
> URL: https://issues.apache.org/jira/browse/SPARK-20156
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.1.0
> Environment: Ubunutu 16.04
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
>Reporter: Serkan Taş
>Assignee: Sean Owen
> Attachments: sprk_shell.txt
>
>
> If the regional setting of the operation system is Turkish, the famous java 
> locale problem occurs (https://jira.atlassian.com/browse/CONF-5931 or 
> https://issues.apache.org/jira/browse/AVRO-1493). 
> e.g : 
> "SERDEINFO" lowers to "serdeınfo"
> "uniquetable" uppers to "UNİQUETABLE"
> work around : 
> add -Duser.country=US -Duser.language=en to the end of the line 
> SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"
> in spark-shell.sh



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20190) '/applications/[app-id]/jobs' in rest api,status should be [running|succeeded|failed|unknown]

2017-04-04 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20190.
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   2.1.1

Issue resolved by pull request 17507
[https://github.com/apache/spark/pull/17507]

> '/applications/[app-id]/jobs' in rest api,status should be 
> [running|succeeded|failed|unknown]
> -
>
> Key: SPARK-20190
> URL: https://issues.apache.org/jira/browse/SPARK-20190
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.1.0
>Reporter: guoxiaolongzte
>Assignee: guoxiaolongzte
>Priority: Trivial
> Fix For: 2.1.1, 2.2.0
>
>
> '/applications/[app-id]/jobs' in rest api.status should 
> be'[running|succeeded|failed|unknown]'.
> now status is '[complete|succeeded|failed]'.
> but '/applications/[app-id]/jobs?status=complete' the server return 'HTTP 
> ERROR 404'.
> Added '?status=running' and '?status=unknown'.
> code ：
> public enum JobExecutionStatus {
>   RUNNING,
>   SUCCEEDED,
>   FAILED,
>   UNKNOWN;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20193) Selecting empty struct causes ExpressionEncoder error.

2017-04-04 Thread Adrian Ionescu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954878#comment-15954878
 ] 

Adrian Ionescu commented on SPARK-20193:


Thanks for the workaround, but, sorry, this is not good enough. I agree that an 
empty struct is not very useful, but if it's not supported then the docs should 
say so and the error message should be clear.

In my case, I'm building this struct dynamically, based on user input, so it 
may or may not be empty. Right now I have to special case it, but that 
introduces unnecessary complexity and makes the code less readable.


> Selecting empty struct causes ExpressionEncoder error.
> --
>
> Key: SPARK-20193
> URL: https://issues.apache.org/jira/browse/SPARK-20193
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Adrian Ionescu
>  Labels: struct
>
> {{def struct(cols: Column*): Column}}
> Given the above signature and the lack of any note in the docs saying that a 
> struct with no columns is not supported, I would expect the following to work:
> {{spark.range(3).select(col("id"), struct().as("empty_struct")).collect}}
> However, this results in:
> {quote}
> java.lang.AssertionError: assertion failed: each serializer expression should 
> contains at least one `BoundReference`
>   at scala.Predef$.assert(Predef.scala:170)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:240)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:238)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.(ExpressionEncoder.scala:238)
>   at 
> org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:63)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2837)
>   at org.apache.spark.sql.Dataset.select(Dataset.scala:1131)
>   ... 39 elided
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20193) Selecting empty struct causes ExpressionEncoder error.

2017-04-04 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-20193:
--
 Labels:   (was: struct)
   Priority: Minor  (was: Major)
Component/s: (was: Spark Core)
 SQL
 Documentation
 Issue Type: Improvement  (was: Bug)

This sounds like a doc issue as it's not clear that this is meaningful to 
support.

> Selecting empty struct causes ExpressionEncoder error.
> --
>
> Key: SPARK-20193
> URL: https://issues.apache.org/jira/browse/SPARK-20193
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 2.1.0
>Reporter: Adrian Ionescu
>Priority: Minor
>
> {{def struct(cols: Column*): Column}}
> Given the above signature and the lack of any note in the docs saying that a 
> struct with no columns is not supported, I would expect the following to work:
> {{spark.range(3).select(col("id"), struct().as("empty_struct")).collect}}
> However, this results in:
> {quote}
> java.lang.AssertionError: assertion failed: each serializer expression should 
> contains at least one `BoundReference`
>   at scala.Predef$.assert(Predef.scala:170)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:240)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:238)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.(ExpressionEncoder.scala:238)
>   at 
> org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:63)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2837)
>   at org.apache.spark.sql.Dataset.select(Dataset.scala:1131)
>   ... 39 elided
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener

2017-04-04 Thread Daan Van den Nest (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954903#comment-15954903
 ] 

Daan Van den Nest commented on SPARK-17742:
---

Thanks for the advice [~anshbansal], I'll give that a try. 
A question to the spark developers: is a fix for this bug already on the 
roadmap for a next release of spark?



> Spark Launcher does not get failed state in Listener 
> -
>
> Key: SPARK-17742
> URL: https://issues.apache.org/jira/browse/SPARK-17742
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I tried to launch an application using the below code. This is dummy code to 
> reproduce the problem. I tried exiting spark with status -1, throwing an 
> exception etc. but in no case did the listener give me failed status. But if 
> a spark job returns -1 or throws an exception from the main method it should 
> be considered as a failure. 
> {code}
> package com.example;
> import org.apache.spark.launcher.SparkAppHandle;
> import org.apache.spark.launcher.SparkLauncher;
> import java.io.IOException;
> public class Main2 {
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> SparkLauncher launcher = new SparkLauncher()
> .setSparkHome("/opt/spark2")
> 
> .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar")
> .setMainClass("com.example.Main")
> .setMaster("local[2]");
> launcher.startApplication(new MyListener());
> Thread.sleep(1000 * 60);
> }
> }
> class MyListener implements SparkAppHandle.Listener {
> @Override
> public void stateChanged(SparkAppHandle handle) {
> System.out.println("state changed " + handle.getState());
> }
> @Override
> public void infoChanged(SparkAppHandle handle) {
> System.out.println("info changed " + handle.getState());
> }
> }
> {code}
> The spark job is 
> {code}
> package com.example;
> import org.apache.spark.sql.SparkSession;
> import java.io.IOException;
> public class Main {
> public static void main(String[] args) throws IOException {
> SparkSession sparkSession = SparkSession
> .builder()
> .appName("" + System.currentTimeMillis())
> .getOrCreate();
> try {
> for (int i = 0; i < 15; i++) {
> Thread.sleep(1000);
> System.out.println("sleeping 1");
> }
> } catch (InterruptedException e) {
> e.printStackTrace();
> }
> //sparkSession.stop();
> System.exit(-1);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan

2017-04-04 Thread Nick Pentreath (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954904#comment-15954904
 ] 

Nick Pentreath commented on SPARK-20203:


I see there is a comment in the code that says: {{// TODO: support unbounded 
pattern length when maxPatternLength = 0}}. But the same thing can essentially 
be achieved setting the pattern length to {{Int.MaxValue}} as Sean has 
previously said, so I don't really think this is a valid work item (in fact 
probably that comment should be removed).

Is an unbounded default really better (or worse) from an API / user facing 
perspective? There are arguments either way but to be honest I see nothing 
compelling enough to warrant a change here.

> Change default maxPatternLength value to Int.MaxValue in PrefixSpan
> ---
>
> Key: SPARK-20203
> URL: https://issues.apache.org/jira/browse/SPARK-20203
> Project: Spark
>  Issue Type: Wish
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Trivial
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I think changing the default value to Int.MaxValue would be more user 
> friendly. At least for new users.
> Personally, when I run an algorithm, I expect it to find all solution by 
> default. And a limited number of them, when I set the parameters to do so.
> The current implementation limit the length of solution patterns to 10.
> Thus preventing all solution to be printed when running slightly large 
> datasets.
> I feel like that should be changed, but since this would change the default 
> behavior of PrefixSpan. I think asking for the communities opinion should 
> come first. So, what do you think ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener

2017-04-04 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954905#comment-15954905
 ] 

Sean Owen commented on SPARK-17742:
---

[~daanvdn] did you implement Marcelo's suggestion? this is pretty WYSIWYG -- if 
you want to push a change, investigate it and test and then open a PR. If you 
don't see anything here, nobody's working on it.

> Spark Launcher does not get failed state in Listener 
> -
>
> Key: SPARK-17742
> URL: https://issues.apache.org/jira/browse/SPARK-17742
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I tried to launch an application using the below code. This is dummy code to 
> reproduce the problem. I tried exiting spark with status -1, throwing an 
> exception etc. but in no case did the listener give me failed status. But if 
> a spark job returns -1 or throws an exception from the main method it should 
> be considered as a failure. 
> {code}
> package com.example;
> import org.apache.spark.launcher.SparkAppHandle;
> import org.apache.spark.launcher.SparkLauncher;
> import java.io.IOException;
> public class Main2 {
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> SparkLauncher launcher = new SparkLauncher()
> .setSparkHome("/opt/spark2")
> 
> .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar")
> .setMainClass("com.example.Main")
> .setMaster("local[2]");
> launcher.startApplication(new MyListener());
> Thread.sleep(1000 * 60);
> }
> }
> class MyListener implements SparkAppHandle.Listener {
> @Override
> public void stateChanged(SparkAppHandle handle) {
> System.out.println("state changed " + handle.getState());
> }
> @Override
> public void infoChanged(SparkAppHandle handle) {
> System.out.println("info changed " + handle.getState());
> }
> }
> {code}
> The spark job is 
> {code}
> package com.example;
> import org.apache.spark.sql.SparkSession;
> import java.io.IOException;
> public class Main {
> public static void main(String[] args) throws IOException {
> SparkSession sparkSession = SparkSession
> .builder()
> .appName("" + System.currentTimeMillis())
> .getOrCreate();
> try {
> for (int i = 0; i < 15; i++) {
> Thread.sleep(1000);
> System.out.println("sleeping 1");
> }
> } catch (InterruptedException e) {
> e.printStackTrace();
> }
> //sparkSession.stop();
> System.exit(-1);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan

2017-04-04 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20203.
---
Resolution: Won't Fix

> Change default maxPatternLength value to Int.MaxValue in PrefixSpan
> ---
>
> Key: SPARK-20203
> URL: https://issues.apache.org/jira/browse/SPARK-20203
> Project: Spark
>  Issue Type: Wish
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Trivial
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I think changing the default value to Int.MaxValue would be more user 
> friendly. At least for new users.
> Personally, when I run an algorithm, I expect it to find all solution by 
> default. And a limited number of them, when I set the parameters to do so.
> The current implementation limit the length of solution patterns to 10.
> Thus preventing all solution to be printed when running slightly large 
> datasets.
> I feel like that should be changed, but since this would change the default 
> behavior of PrefixSpan. I think asking for the communities opinion should 
> come first. So, what do you think ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20193) Selecting empty struct causes ExpressionEncoder error.

2017-04-04 Thread Adrian Ionescu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954911#comment-15954911
 ] 

Adrian Ionescu commented on SPARK-20193:


In that case better change the signature of the function:
{{def struct(col: Column, cols: Column*): Column}}

> Selecting empty struct causes ExpressionEncoder error.
> --
>
> Key: SPARK-20193
> URL: https://issues.apache.org/jira/browse/SPARK-20193
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 2.1.0
>Reporter: Adrian Ionescu
>Priority: Minor
>
> {{def struct(cols: Column*): Column}}
> Given the above signature and the lack of any note in the docs saying that a 
> struct with no columns is not supported, I would expect the following to work:
> {{spark.range(3).select(col("id"), struct().as("empty_struct")).collect}}
> However, this results in:
> {quote}
> java.lang.AssertionError: assertion failed: each serializer expression should 
> contains at least one `BoundReference`
>   at scala.Predef$.assert(Predef.scala:170)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:240)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:238)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.(ExpressionEncoder.scala:238)
>   at 
> org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:63)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2837)
>   at org.apache.spark.sql.Dataset.select(Dataset.scala:1131)
>   ... 39 elided
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20206) spark.ui.killEnabled=false property doesn't reflect on task/stages

2017-04-04 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954932#comment-15954932
 ] 

Sean Owen commented on SPARK-20206:
---

Can you first confirm in the Environments tab that the property is being set 
where you think it is?
Can you show a screen shot or further describe what page you're looking at?

> spark.ui.killEnabled=false property doesn't reflect on task/stages
> --
>
> Key: SPARK-20206
> URL: https://issues.apache.org/jira/browse/SPARK-20206
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: srinivasan
>Priority: Minor
>
> spark.ui.killEnabled=false property doesn't reflect on active task and 
> stages.kill hyperlink is still enabled on active tasks and stages



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20210) Tests aborted in Spark SQL on ppc64le

2017-04-04 Thread Sonia Garudi (JIRA)

Sonia Garudi created SPARK-20210:


 Summary: Tests aborted in Spark SQL on ppc64le
 Key: SPARK-20210
 URL: https://issues.apache.org/jira/browse/SPARK-20210
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
 Environment: Ubuntu 14.04 ppc64le 
$ java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
Reporter: Sonia Garudi


The tests get aborted with the following error :
{code}
[31m*** RUN ABORTED ***[0m
[31m  org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 
seconds]. This timeout is controlled by spark.rpc.askTimeout[0m
[31m  at 
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)[0m
[31m  at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)[0m
[31m  at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)[0m
[31m  at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)[0m
[31m  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)[0m
[31m  at 
org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m
[31m  at 
org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m
[31m  at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m
[31m  at 
org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
[31m  at 
org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
[31m  ...[0m
[31m  Cause: java.util.concurrent.TimeoutException: Futures timed out after 
[120 seconds][0m
[31m  at 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)[0m
[31m  at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)[0m
[31m  at 
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)[0m
[31m  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)[0m
[31m  at 
org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m
[31m  at 
org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m
[31m  at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m
[31m  at 
org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
[31m  at 
org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
[31m  at scala.collection.Iterator$class.foreach(Iterator.scala:893)[0m
[31m  ...[0m
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20210) Scala tests aborted in Spark SQL on ppc64le

2017-04-04 Thread Sonia Garudi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sonia Garudi updated SPARK-20210:
-
Summary: Scala tests aborted in Spark SQL on ppc64le  (was: Tests aborted 
in Spark SQL on ppc64le)

> Scala tests aborted in Spark SQL on ppc64le
> ---
>
> Key: SPARK-20210
> URL: https://issues.apache.org/jira/browse/SPARK-20210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
> Environment: Ubuntu 14.04 ppc64le 
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>Reporter: Sonia Garudi
>  Labels: ppc64le
>
> The tests get aborted with the following error :
> {code}
> [31m*** RUN ABORTED ***[0m
> [31m  org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 
> seconds]. This timeout is controlled by spark.rpc.askTimeout[0m
> [31m  at 
> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)[0m
> [31m  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)[0m
> [31m  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)[0m
> [31m  at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)[0m
> [31m  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)[0m
> [31m  at 
> org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m
> [31m  at 
> org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m
> [31m  at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m
> [31m  at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
> [31m  at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
> [31m  ...[0m
> [31m  Cause: java.util.concurrent.TimeoutException: Futures timed out after 
> [120 seconds][0m
> [31m  at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)[0m
> [31m  at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)[0m
> [31m  at 
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)[0m
> [31m  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)[0m
> [31m  at 
> org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m
> [31m  at 
> org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m
> [31m  at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m
> [31m  at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
> [31m  at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
> [31m  at scala.collection.Iterator$class.foreach(Iterator.scala:893)[0m
> [31m  ...[0m
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20210) Scala tests aborted in Spark SQL on ppc64le

2017-04-04 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-20210:
--
  Priority: Minor  (was: Major)
Issue Type: Improvement  (was: Bug)

Are you sure it's due to PPC? or just a flaky test?
BTW I'm not sure we would consider PPC supported, so I won't mark this is as a 
bug. However if there are simple ways to make it just work, that's fine.

> Scala tests aborted in Spark SQL on ppc64le
> ---
>
> Key: SPARK-20210
> URL: https://issues.apache.org/jira/browse/SPARK-20210
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
> Environment: Ubuntu 14.04 ppc64le 
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>Reporter: Sonia Garudi
>Priority: Minor
>  Labels: ppc64le
>
> The tests get aborted with the following error :
> {code}
> [31m*** RUN ABORTED ***[0m
> [31m  org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 
> seconds]. This timeout is controlled by spark.rpc.askTimeout[0m
> [31m  at 
> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)[0m
> [31m  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)[0m
> [31m  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)[0m
> [31m  at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)[0m
> [31m  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)[0m
> [31m  at 
> org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m
> [31m  at 
> org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m
> [31m  at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m
> [31m  at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
> [31m  at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
> [31m  ...[0m
> [31m  Cause: java.util.concurrent.TimeoutException: Futures timed out after 
> [120 seconds][0m
> [31m  at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)[0m
> [31m  at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)[0m
> [31m  at 
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)[0m
> [31m  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)[0m
> [31m  at 
> org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m
> [31m  at 
> org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m
> [31m  at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m
> [31m  at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
> [31m  at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
> [31m  at scala.collection.Iterator$class.foreach(Iterator.scala:893)[0m
> [31m  ...[0m
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20206) spark.ui.killEnabled=false property doesn't reflect on task/stages

2017-04-04 Thread srinivasan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954956#comment-15954956
 ] 

srinivasan commented on SPARK-20206:


Hi , I would like to close this issue .I have wrongly configured .It seems that 
the property is not set where i submit spark job.sorry

> spark.ui.killEnabled=false property doesn't reflect on task/stages
> --
>
> Key: SPARK-20206
> URL: https://issues.apache.org/jira/browse/SPARK-20206
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: srinivasan
>Priority: Minor
>
> spark.ui.killEnabled=false property doesn't reflect on active task and 
> stages.kill hyperlink is still enabled on active tasks and stages



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-20206) spark.ui.killEnabled=false property doesn't reflect on task/stages

2017-04-04 Thread srinivasan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

srinivasan closed SPARK-20206.
--
Resolution: Not A Bug

> spark.ui.killEnabled=false property doesn't reflect on task/stages
> --
>
> Key: SPARK-20206
> URL: https://issues.apache.org/jira/browse/SPARK-20206
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: srinivasan
>Priority: Minor
>
> spark.ui.killEnabled=false property doesn't reflect on active task and 
> stages.kill hyperlink is still enabled on active tasks and stages



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener

2017-04-04 Thread Daan Van den Nest (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954988#comment-15954988
 ] 

Daan Van den Nest commented on SPARK-17742:
---

Hi [~srowen], unfortunately I'm a java dev, so changing stuff in a scala code 
base is quite a bit out of my comfort zone. In case that changes, I will 
definitely look into it :-), 

Cheers

> Spark Launcher does not get failed state in Listener 
> -
>
> Key: SPARK-17742
> URL: https://issues.apache.org/jira/browse/SPARK-17742
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I tried to launch an application using the below code. This is dummy code to 
> reproduce the problem. I tried exiting spark with status -1, throwing an 
> exception etc. but in no case did the listener give me failed status. But if 
> a spark job returns -1 or throws an exception from the main method it should 
> be considered as a failure. 
> {code}
> package com.example;
> import org.apache.spark.launcher.SparkAppHandle;
> import org.apache.spark.launcher.SparkLauncher;
> import java.io.IOException;
> public class Main2 {
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> SparkLauncher launcher = new SparkLauncher()
> .setSparkHome("/opt/spark2")
> 
> .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar")
> .setMainClass("com.example.Main")
> .setMaster("local[2]");
> launcher.startApplication(new MyListener());
> Thread.sleep(1000 * 60);
> }
> }
> class MyListener implements SparkAppHandle.Listener {
> @Override
> public void stateChanged(SparkAppHandle handle) {
> System.out.println("state changed " + handle.getState());
> }
> @Override
> public void infoChanged(SparkAppHandle handle) {
> System.out.println("info changed " + handle.getState());
> }
> }
> {code}
> The spark job is 
> {code}
> package com.example;
> import org.apache.spark.sql.SparkSession;
> import java.io.IOException;
> public class Main {
> public static void main(String[] args) throws IOException {
> SparkSession sparkSession = SparkSession
> .builder()
> .appName("" + System.currentTimeMillis())
> .getOrCreate();
> try {
> for (int i = 0; i < 15; i++) {
> Thread.sleep(1000);
> System.out.println("sleeping 1");
> }
> } catch (InterruptedException e) {
> e.printStackTrace();
> }
> //sparkSession.stop();
> System.exit(-1);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20198) Remove the inconsistency in table/function name conventions in SparkSession.Catalog APIs

2017-04-04 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-20198.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 17518
[https://github.com/apache/spark/pull/17518]

> Remove the inconsistency in table/function name conventions in 
> SparkSession.Catalog APIs
> 
>
> Key: SPARK-20198
> URL: https://issues.apache.org/jira/browse/SPARK-20198
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.2.0
>
>
> Observed by @felixcheung , in `SparkSession`.`Catalog` APIs, we have 
> different conventions/rules for table/function identifiers/names. Most APIs 
> accept the qualified name (i.e., `databaseName`.`tableName` or 
> `databaseName`.`functionName`). However, the following five APIs do not 
> accept it.  
> - def listColumns(tableName: String): Dataset[Column]
> - def getTable(tableName: String): Table 
> - def getFunction(functionName: String): Function
> - def tableExists(tableName: String): Boolean
> - def functionExists(functionName: String): Boolean
> It is desirable to make them more consistent with the other Catalog APIs, 
> updates the function/API comments and adds the `@params` to clarify the 
> inputs we allow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20185) csv decompressed incorrectly with extention other than 'gz'

2017-04-04 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20185.
---
Resolution: Not A Problem

> csv decompressed incorrectly with extention other than 'gz'
> ---
>
> Key: SPARK-20185
> URL: https://issues.apache.org/jira/browse/SPARK-20185
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.6.3, 2.0.0, 2.0.1, 2.0.2, 2.1.0
>Reporter: Ran Mingxuan
>Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> With code below:
> val start_time = System.currentTimeMillis()
> val gzFile = spark.read
> .format("com.databricks.spark.csv")
> .option("header", "false")
> .option("inferSchema", "false")
> .option("codec", "gzip")
> .load("/foo/someCsvFile.gz.bak")
> gzFile.repartition(1).write.mode("overwrite").parquet("/foo/")
> got error even if I indicated the codec:
> WARN util.NativeCodeLoader: Unable to load native-hadoop library for your 
> platform... using builtin-java classes where applicable
> 17/03/23 15:44:55 WARN ipc.Client: Exception encountered while connecting to 
> the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
> 17/03/23 15:44:58 ERROR executor.Executor: Exception in task 2.0 in stage 
> 12.0 (TID 977)
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:109)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:94)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:167)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:166)
> Have to add extension to GzipCodec  to make my code run.
> import org.apache.hadoop.io.compress.GzipCodec
> class BakGzipCodec extends GzipCodec {
>   override def getDefaultExtension(): String = ".gz.bak"
> }
> I suppose the file loader should get file codec depending on option first, 
> and then to extension.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-04-04 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20144.
---
Resolution: Not A Problem

> spark.read.parquet no long maintains ordering of the data
> -
>
> Key: SPARK-20144
> URL: https://issues.apache.org/jira/browse/SPARK-20144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Li Jin
>
> Hi, We are trying to upgrade Spark from 1.6.3 to 2.0.2. One issue we found is 
> when we read parquet files in 2.0.2, the ordering of rows in the resulting 
> dataframe is not the same as the ordering of rows in the dataframe that the 
> parquet file was reproduced with. 
> This is because FileSourceStrategy.scala combines the parquet files into 
> fewer partitions and also reordered them. This breaks our workflows because 
> they assume the ordering of the data. 
> Is this considered a bug? Also FileSourceStrategy and FileSourceScanExec 
> changed quite a bit from 2.0.2 to 2.1, so not sure if this is an issue with 
> 2.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Rangwani reopened SPARK-14492:


It was closed with the comment "should work"
Needs looking into

> Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not 
> backwards compatible with earlier version
> ---
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Rangwani updated SPARK-14492:
---
Comment: was deleted

(was: It was closed with the comment "should work"
Needs looking into)

> Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not 
> backwards compatible with earlier version
> ---
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-14492.
---
Resolution: Not A Problem

[~sunil.rangwani] please don't reopen this unless you have new information. 
JIRA is not a place to ask people to do investigation, if that's what you're 
requesting, but for you to propose changes or report your own detailed 
investigation.

> Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not 
> backwards compatible with earlier version
> ---
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen closed SPARK-14492.
-

> Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not 
> backwards compatible with earlier version
> ---
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20210) Scala tests aborted in Spark SQL on ppc64le

2017-04-04 Thread Sonia Garudi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955032#comment-15955032
 ] 

Sonia Garudi commented on SPARK-20210:
--

[~srowen] , its not a flaky test. Although I am not sure whether its due to the 
PPC arch. I have tested it on both ppc64le and x86 arch. The tests ran and 
passed on the x86 platform.

> Scala tests aborted in Spark SQL on ppc64le
> ---
>
> Key: SPARK-20210
> URL: https://issues.apache.org/jira/browse/SPARK-20210
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
> Environment: Ubuntu 14.04 ppc64le 
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>Reporter: Sonia Garudi
>Priority: Minor
>  Labels: ppc64le
>
> The tests get aborted with the following error :
> {code}
> [31m*** RUN ABORTED ***[0m
> [31m  org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 
> seconds]. This timeout is controlled by spark.rpc.askTimeout[0m
> [31m  at 
> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)[0m
> [31m  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)[0m
> [31m  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)[0m
> [31m  at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)[0m
> [31m  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)[0m
> [31m  at 
> org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m
> [31m  at 
> org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m
> [31m  at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m
> [31m  at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
> [31m  at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
> [31m  ...[0m
> [31m  Cause: java.util.concurrent.TimeoutException: Futures timed out after 
> [120 seconds][0m
> [31m  at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)[0m
> [31m  at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)[0m
> [31m  at 
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)[0m
> [31m  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)[0m
> [31m  at 
> org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m
> [31m  at 
> org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m
> [31m  at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m
> [31m  at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
> [31m  at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m
> [31m  at scala.collection.Iterator$class.foreach(Iterator.scala:893)[0m
> [31m  ...[0m
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20202) Remove references to org.spark-project.hive

2017-04-04 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955188#comment-15955188
 ] 

Sean Owen commented on SPARK-20202:
---

[~marmbrus] as release manager of the moment, I suggest we actually formally 
vote on release the org.spark-project.hive artifact, as I'm not clear we ever 
did formally. That much seems like a must-have. I don't know that it requires 
re-releasing the artifacts, but at least having a meaningful review of what it 
is, and agreeing (or not) that it's what the PMC wants to release, would I 
believe resolve doubts about the legitimacy of that artifact.

There's still more to do no doubt, to get rid of the fork. This might include 
seeing if Hive 1.2.x can provide an un-uberized artifact.

> Remove references to org.spark-project.hive
> ---
>
> Key: SPARK-20202
> URL: https://issues.apache.org/jira/browse/SPARK-20202
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 1.6.4, 2.0.3, 2.1.1
>Reporter: Owen O'Malley
>Priority: Blocker
>
> Spark can't continue to depend on their fork of Hive and must move to 
> standard Hive versions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12606) Scala/Java compatibility issue Re: how to extend java transformer from Scala UnaryTransformer ?

2017-04-04 Thread Guillaume Dardelet (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955191#comment-15955191
 ] 

Guillaume Dardelet commented on SPARK-12606:


I had the same issue in Scala and I solved it by overloading the constructor so 
that it initialises the UID.

The error comes from the initialisation of the parameter "inputCol".
You get "null__inputCol" because when the parameter was initialised, your class 
didn't have a uid.

Therefore, instead of

class Lemmatizer extends UnaryTransformer[String, String, Lemmatizer] {
  override val uid: String = Identifiable.randomUID("lemmatizer")
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}

Do this:

class Lemmatizer(override val uid: String) extends UnaryTransformer[String, 
String, Lemmatizer] {
  def this() = this( Identifiable.randomUID("lemmatizer") )
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}

> Scala/Java compatibility issue Re: how to extend java transformer from Scala 
> UnaryTransformer ?
> ---
>
> Key: SPARK-12606
> URL: https://issues.apache.org/jira/browse/SPARK-12606
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.5.2
> Environment: Java 8, Mac OS, Spark-1.5.2
>Reporter: Andrew Davidson
>  Labels: transformers
>
> Hi Andy,
> I suspect that you hit the Scala/Java compatibility issue, I can also 
> reproduce this issue, so could you file a JIRA to track this issue?
> Yanbo
> 2016-01-02 3:38 GMT+08:00 Andy Davidson :
> I am trying to write a trivial transformer I use use in my pipeline. I am 
> using java and spark 1.5.2. It was suggested that I use the Tokenize.scala 
> class as an example. This should be very easy how ever I do not understand 
> Scala, I am having trouble debugging the following exception.
> Any help would be greatly appreciated.
> Happy New Year
> Andy
> java.lang.IllegalArgumentException: requirement failed: Param null__inputCol 
> does not belong to Stemmer_2f3aa96d-7919-4eaa-ad54-f7c620b92d1c.
>   at scala.Predef$.require(Predef.scala:233)
>   at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:557)
>   at org.apache.spark.ml.param.Params$class.set(params.scala:436)
>   at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
>   at org.apache.spark.ml.param.Params$class.set(params.scala:422)
>   at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
>   at 
> org.apache.spark.ml.UnaryTransformer.setInputCol(Transformer.scala:83)
>   at com.pws.xxx.ml.StemmerTest.test(StemmerTest.java:30)
> public class StemmerTest extends AbstractSparkTest {
> @Test
> public void test() {
> Stemmer stemmer = new Stemmer()
> .setInputCol("raw”) //line 30
> .setOutputCol("filtered");
> }
> }
> /**
>  * @ see 
> spark-1.5.1/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
>  * @ see 
> https://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/
>  * @ see 
> http://www.tonytruong.net/movie-rating-prediction-with-apache-spark-and-hortonworks/
>  * 
>  * @author andrewdavidson
>  *
>  */
> public class Stemmer extends UnaryTransformer, List, 
> Stemmer> implements Serializable{
> static Logger logger = LoggerFactory.getLogger(Stemmer.class);
> private static final long serialVersionUID = 1L;
> private static final  ArrayType inputType = 
> DataTypes.createArrayType(DataTypes.StringType, true);
> private final String uid = Stemmer.class.getSimpleName() + "_" + 
> UUID.randomUUID().toString();
> @Override
> public String uid() {
> return uid;
> }
> /*
>override protected def validateInputType(inputType: DataType): Unit = {
> require(inputType == StringType, s"Input type must be string type but got 
> $inputType.")
>   }
>  */
> @Override
> public void validateInputType(DataType inputTypeArg) {
> String msg = "inputType must be " + inputType.simpleString() + " but 
> got " + inputTypeArg.simpleString();
> assert (inputType.equals(inputTypeArg)) : msg; 
> }
> 
> @Override
> public Function1, List> createTransformFunc() {
> // 
> http://stackoverflow.com/questions/6545066/using-scala-from-java-passing-functions-as-parameters
> Function1, List> f = new 
> AbstractFunction1, List>() {
> public List apply(List words) {
> for(String word : words) {
> logger.error("AEDWIP input word: {}", word);
> }
> return wor

[jira] [Comment Edited] (SPARK-12606) Scala/Java compatibility issue Re: how to extend java transformer from Scala UnaryTransformer ?

2017-04-04 Thread Guillaume Dardelet (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955191#comment-15955191
 ] 

Guillaume Dardelet edited comment on SPARK-12606 at 4/4/17 2:27 PM:


I had the same issue in Scala and I solved it by overloading the constructor so 
that it initialises the UID.

The error comes from the initialisation of the parameter "inputCol".
You get "null__inputCol" because when the parameter was initialised, your class 
didn't have a uid.

Therefore, instead of

{code}
class Lemmatizer extends UnaryTransformer[String, String, Lemmatizer] {
  override val uid: String = Identifiable.randomUID("lemmatizer")
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}
{code}

Do this:

{code}
class Lemmatizer(override val uid: String) extends UnaryTransformer[String, 
String, Lemmatizer] {
  def this() = this( Identifiable.randomUID("lemmatizer") )
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}
{code}


was (Author: panoramix):
I had the same issue in Scala and I solved it by overloading the constructor so 
that it initialises the UID.

The error comes from the initialisation of the parameter "inputCol".
You get "null__inputCol" because when the parameter was initialised, your class 
didn't have a uid.

Therefore, instead of

{code:scala}
class Lemmatizer extends UnaryTransformer[String, String, Lemmatizer] {
  override val uid: String = Identifiable.randomUID("lemmatizer")
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}
{code}

Do this:

class Lemmatizer(override val uid: String) extends UnaryTransformer[String, 
String, Lemmatizer] {
  def this() = this( Identifiable.randomUID("lemmatizer") )
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}

> Scala/Java compatibility issue Re: how to extend java transformer from Scala 
> UnaryTransformer ?
> ---
>
> Key: SPARK-12606
> URL: https://issues.apache.org/jira/browse/SPARK-12606
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.5.2
> Environment: Java 8, Mac OS, Spark-1.5.2
>Reporter: Andrew Davidson
>  Labels: transformers
>
> Hi Andy,
> I suspect that you hit the Scala/Java compatibility issue, I can also 
> reproduce this issue, so could you file a JIRA to track this issue?
> Yanbo
> 2016-01-02 3:38 GMT+08:00 Andy Davidson :
> I am trying to write a trivial transformer I use use in my pipeline. I am 
> using java and spark 1.5.2. It was suggested that I use the Tokenize.scala 
> class as an example. This should be very easy how ever I do not understand 
> Scala, I am having trouble debugging the following exception.
> Any help would be greatly appreciated.
> Happy New Year
> Andy
> java.lang.IllegalArgumentException: requirement failed: Param null__inputCol 
> does not belong to Stemmer_2f3aa96d-7919-4eaa-ad54-f7c620b92d1c.
>   at scala.Predef$.require(Predef.scala:233)
>   at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:557)
>   at org.apache.spark.ml.param.Params$class.set(params.scala:436)
>   at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
>   at org.apache.spark.ml.param.Params$class.set(params.scala:422)
>   at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
>   at 
> org.apache.spark.ml.UnaryTransformer.setInputCol(Transformer.scala:83)
>   at com.pws.xxx.ml.StemmerTest.test(StemmerTest.java:30)
> public class StemmerTest extends AbstractSparkTest {
> @Test
> public void test() {
> Stemmer stemmer = new Stemmer()
> .setInputCol("raw”) //line 30
> .setOutputCol("filtered");
> }
> }
> /**
>  * @ see 
> spark-1.5.1/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
>  * @ see 
> https://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/
>  * @ see 
> http://www.tonytruong.net/movie-rating-prediction-with-apache-spark-and-hortonworks/
>  * 
>  * @author andrewdavidson
>  *
>  */
> public class Stemmer extends UnaryTransformer, List, 
> Stemmer> implements Serializable{
> static Logger logger = LoggerFactory.getLogger(Stemmer.class);
> private static final long serialVersionUID = 1L;
> private static final  ArrayType inputType = 
> DataTypes.createArrayType(DataTypes.StringType, true);
> private final String uid = Stemmer.class.getSimpleName() + "_" + 
> UUID.randomUUID().toString();
> @Override
> public Strin

[jira] [Comment Edited] (SPARK-12606) Scala/Java compatibility issue Re: how to extend java transformer from Scala UnaryTransformer ?

2017-04-04 Thread Guillaume Dardelet (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955191#comment-15955191
 ] 

Guillaume Dardelet edited comment on SPARK-12606 at 4/4/17 2:27 PM:


I had the same issue in Scala and I solved it by overloading the constructor so 
that it initialises the UID.

The error comes from the initialisation of the parameter "inputCol".
You get "null__inputCol" because when the parameter was initialised, your class 
didn't have a uid.

Therefore, instead of

{code:scala}
class Lemmatizer extends UnaryTransformer[String, String, Lemmatizer] {
  override val uid: String = Identifiable.randomUID("lemmatizer")
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}
{code}

Do this:

class Lemmatizer(override val uid: String) extends UnaryTransformer[String, 
String, Lemmatizer] {
  def this() = this( Identifiable.randomUID("lemmatizer") )
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}


was (Author: panoramix):
I had the same issue in Scala and I solved it by overloading the constructor so 
that it initialises the UID.

The error comes from the initialisation of the parameter "inputCol".
You get "null__inputCol" because when the parameter was initialised, your class 
didn't have a uid.

Therefore, instead of

class Lemmatizer extends UnaryTransformer[String, String, Lemmatizer] {
  override val uid: String = Identifiable.randomUID("lemmatizer")
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}

Do this:

class Lemmatizer(override val uid: String) extends UnaryTransformer[String, 
String, Lemmatizer] {
  def this() = this( Identifiable.randomUID("lemmatizer") )
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}

> Scala/Java compatibility issue Re: how to extend java transformer from Scala 
> UnaryTransformer ?
> ---
>
> Key: SPARK-12606
> URL: https://issues.apache.org/jira/browse/SPARK-12606
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.5.2
> Environment: Java 8, Mac OS, Spark-1.5.2
>Reporter: Andrew Davidson
>  Labels: transformers
>
> Hi Andy,
> I suspect that you hit the Scala/Java compatibility issue, I can also 
> reproduce this issue, so could you file a JIRA to track this issue?
> Yanbo
> 2016-01-02 3:38 GMT+08:00 Andy Davidson :
> I am trying to write a trivial transformer I use use in my pipeline. I am 
> using java and spark 1.5.2. It was suggested that I use the Tokenize.scala 
> class as an example. This should be very easy how ever I do not understand 
> Scala, I am having trouble debugging the following exception.
> Any help would be greatly appreciated.
> Happy New Year
> Andy
> java.lang.IllegalArgumentException: requirement failed: Param null__inputCol 
> does not belong to Stemmer_2f3aa96d-7919-4eaa-ad54-f7c620b92d1c.
>   at scala.Predef$.require(Predef.scala:233)
>   at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:557)
>   at org.apache.spark.ml.param.Params$class.set(params.scala:436)
>   at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
>   at org.apache.spark.ml.param.Params$class.set(params.scala:422)
>   at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
>   at 
> org.apache.spark.ml.UnaryTransformer.setInputCol(Transformer.scala:83)
>   at com.pws.xxx.ml.StemmerTest.test(StemmerTest.java:30)
> public class StemmerTest extends AbstractSparkTest {
> @Test
> public void test() {
> Stemmer stemmer = new Stemmer()
> .setInputCol("raw”) //line 30
> .setOutputCol("filtered");
> }
> }
> /**
>  * @ see 
> spark-1.5.1/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
>  * @ see 
> https://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/
>  * @ see 
> http://www.tonytruong.net/movie-rating-prediction-with-apache-spark-and-hortonworks/
>  * 
>  * @author andrewdavidson
>  *
>  */
> public class Stemmer extends UnaryTransformer, List, 
> Stemmer> implements Serializable{
> static Logger logger = LoggerFactory.getLogger(Stemmer.class);
> private static final long serialVersionUID = 1L;
> private static final  ArrayType inputType = 
> DataTypes.createArrayType(DataTypes.StringType, true);
> private final String uid = Stemmer.class.getSimpleName() + "_" + 
> UUID.randomUUID().toString();
> @Override
> public String uid() {
> return u

[jira] [Created] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception

2017-04-04 Thread StanZhai (JIRA)

StanZhai created SPARK-20211:


 Summary: `1 > 0.0001` throws Decimal scale (0) cannot be greater 
than precision (-2) exception
 Key: SPARK-20211
 URL: https://issues.apache.org/jira/browse/SPARK-20211
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0, 2.0.2, 2.0.1, 2.0.0, 2.1.1
Reporter: StanZhai
Priority: Critical


The following SQL:
{code}
select 1 > 0.0001 from tb
{code}
throws Decimal scale (0) cannot be greater than precision (-2) exception in 
Spark 2.x.

`floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and 
Spark 2.x.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception

2017-04-04 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-20211:
-
Priority: Minor  (was: Critical)

> `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) 
> exception
> -
>
> Key: SPARK-20211
> URL: https://issues.apache.org/jira/browse/SPARK-20211
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1
>Reporter: StanZhai
>Priority: Minor
>  Labels: correctness
>
> The following SQL:
> {code}
> select 1 > 0.0001 from tb
> {code}
> throws Decimal scale (0) cannot be greater than precision (-2) exception in 
> Spark 2.x.
> `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and 
> Spark 2.x.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception

2017-04-04 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955239#comment-15955239
 ] 

Hyukjin Kwon commented on SPARK-20211:
--

Please refer http://spark.apache.org/contributing.html

{quote}
Critical: a large minority of users are missing important functionality without 
this, and/or a workaround is difficult
{quote}

workaround as below:

{code}
scala> sql("select double(1) > double(0.0001)").show()
++
|(CAST(1 AS DOUBLE) > CAST(0.0001 AS DOUBLE))|
++
|true|
++
{code}

Also, it states,

{quote}
Priority. Set to Major or below; higher priorities are generally reserved for 
committers to set
{quote}

> `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) 
> exception
> -
>
> Key: SPARK-20211
> URL: https://issues.apache.org/jira/browse/SPARK-20211
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1
>Reporter: StanZhai
>Priority: Critical
>  Labels: correctness
>
> The following SQL:
> {code}
> select 1 > 0.0001 from tb
> {code}
> throws Decimal scale (0) cannot be greater than precision (-2) exception in 
> Spark 2.x.
> `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and 
> Spark 2.x.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception

2017-04-04 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20211:


Assignee: Apache Spark

> `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) 
> exception
> -
>
> Key: SPARK-20211
> URL: https://issues.apache.org/jira/browse/SPARK-20211
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1
>Reporter: StanZhai
>Assignee: Apache Spark
>Priority: Minor
>  Labels: correctness
>
> The following SQL:
> {code}
> select 1 > 0.0001 from tb
> {code}
> throws Decimal scale (0) cannot be greater than precision (-2) exception in 
> Spark 2.x.
> `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and 
> Spark 2.x.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception

2017-04-04 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20211:


Assignee: (was: Apache Spark)

> `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) 
> exception
> -
>
> Key: SPARK-20211
> URL: https://issues.apache.org/jira/browse/SPARK-20211
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1
>Reporter: StanZhai
>Priority: Minor
>  Labels: correctness
>
> The following SQL:
> {code}
> select 1 > 0.0001 from tb
> {code}
> throws Decimal scale (0) cannot be greater than precision (-2) exception in 
> Spark 2.x.
> `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and 
> Spark 2.x.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception

2017-04-04 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955243#comment-15955243
 ] 

Apache Spark commented on SPARK-20211:
--

User 'stanzhai' has created a pull request for this issue:
https://github.com/apache/spark/pull/17529

> `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) 
> exception
> -
>
> Key: SPARK-20211
> URL: https://issues.apache.org/jira/browse/SPARK-20211
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1
>Reporter: StanZhai
>Priority: Minor
>  Labels: correctness
>
> The following SQL:
> {code}
> select 1 > 0.0001 from tb
> {code}
> throws Decimal scale (0) cannot be greater than precision (-2) exception in 
> Spark 2.x.
> `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and 
> Spark 2.x.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception

2017-04-04 Thread StanZhai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

StanZhai updated SPARK-20211:
-
Priority: Major  (was: Minor)

> `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) 
> exception
> -
>
> Key: SPARK-20211
> URL: https://issues.apache.org/jira/browse/SPARK-20211
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1
>Reporter: StanZhai
>  Labels: correctness
>
> The following SQL:
> {code}
> select 1 > 0.0001 from tb
> {code}
> throws Decimal scale (0) cannot be greater than precision (-2) exception in 
> Spark 2.x.
> `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and 
> Spark 2.x.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955284#comment-15955284
 ] 

Sunil Rangwani commented on SPARK-14492:


[~srowen] Please don't close with a comment "should work" without any helpful 
contribution. 
Let me describe the issue again:  
1. Spark was built with -PHive where it used Hive version 1.2.1 
2. config properties for {{spark.sql.hive.metastore.jars}} and 
{{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as 
described in the documentation
3. Tried using spark sql by pointing to a remote external metastore service. 
Tried setting 1.2.1 in step #2 which did not work because external metastore 
was older version. 
Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which 
worked. The reason for this is because spark uses a Hive config property here 
https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500
 that was introduced only in Hive 1.2.0 as part of 
https://issues.apache.org/jira/browse/HIVE-9508 


> Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not 
> backwards compatible with earlier version
> ---
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Rangwani reopened SPARK-14492:


> Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not 
> backwards compatible with earlier version
> ---
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception

2017-04-04 Thread StanZhai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955285#comment-15955285
 ] 

StanZhai commented on SPARK-20211:
--

A workaround is difficult for me, because of all of my SQL are generated by a 
high-level system, I cannot cast all columns as double.
FLOOR and CEIL are frequently used functions, and not all users will give a 
feedback to the community when encounter this problem.
We should pay attention to the correctness of the SQL.

> `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) 
> exception
> -
>
> Key: SPARK-20211
> URL: https://issues.apache.org/jira/browse/SPARK-20211
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1
>Reporter: StanZhai
>  Labels: correctness
>
> The following SQL:
> {code}
> select 1 > 0.0001 from tb
> {code}
> throws Decimal scale (0) cannot be greater than precision (-2) exception in 
> Spark 2.x.
> `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and 
> Spark 2.x.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955284#comment-15955284
 ] 

Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 3:51 PM:


[~srowen] Please don't close with a comment "should work" without any helpful 
contribution. 
Let me describe the issue again:  
1. Spark was built with -PHive where it used Hive version 1.2.1 
2. config properties for {{spark.sql.hive.metastore.jars}} and 
{{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as 
described in the documentation
3. Tried using spark sql by pointing to a remote external metastore service. 
Tried setting 1.2.1 in step #2 which did not work because external metastore 
was older version. 
Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which 
worked. The reason for this is because spark uses a Hive config property here 
https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500
 that was introduced only in Hive 1.2.0 as part of 
https://issues.apache.org/jira/browse/HIVE-9508 



was (Author: sunil.rangwani):
[~srowen] Please don't close with a comment "should work" without any helpful 
contribution. 
Let me describe the issue again:  
1. Spark was built with -PHive where it uses Hive version 1.2.1 
2. config properties for {{spark.sql.hive.metastore.jars}} and 
{{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as 
described in the documentation
3. Tried using spark sql by pointing to a remote external metastore service. 
Tried setting 1.2.1 in step #2 which did not work because external metastore 
was older version. 
Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which 
worked. The reason for this is because spark uses a Hive config property here 
https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500
 that was introduced only in Hive 1.2.0 as part of 
https://issues.apache.org/jira/browse/HIVE-9508 


> Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not 
> backwards compatible with earlier version
> ---
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit

[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955284#comment-15955284
 ] 

Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 3:51 PM:


[~srowen] Please don't close with a comment "should work" without any helpful 
contribution. 
Let me describe the issue again:  
1. Spark was built with -PHive where it uses Hive version 1.2.1 
2. config properties for {{spark.sql.hive.metastore.jars}} and 
{{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as 
described in the documentation
3. Tried using spark sql by pointing to a remote external metastore service. 
Tried setting 1.2.1 in step #2 which did not work because external metastore 
was older version. 
Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which 
worked. The reason for this is because spark uses a Hive config property here 
https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500
 that was introduced only in Hive 1.2.0 as part of 
https://issues.apache.org/jira/browse/HIVE-9508 



was (Author: sunil.rangwani):
[~srowen] Please don't close with a comment "should work" without any helpful 
contribution. 
Let me describe the issue again:  
1. Spark was built with -PHive where it used Hive version 1.2.1 
2. config properties for {{spark.sql.hive.metastore.jars}} and 
{{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as 
described in the documentation
3. Tried using spark sql by pointing to a remote external metastore service. 
Tried setting 1.2.1 in step #2 which did not work because external metastore 
was older version. 
Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which 
worked. The reason for this is because spark uses a Hive config property here 
https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500
 that was introduced only in Hive 1.2.0 as part of 
https://issues.apache.org/jira/browse/HIVE-9508 


> Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not 
> backwards compatible with earlier version
> ---
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit

[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955284#comment-15955284
 ] 

Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 3:55 PM:


[~srowen] Please don't close with a comment "should work" without any helpful 
contribution. 
Let me describe the issue again:  
1. Spark was built with -PHive where it used Hive version 1.2.1 
2. config properties for {{spark.sql.hive.metastore.jars}} and 
{{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as 
described in the documentation
3. Tried using spark sql by pointing to a remote external metastore service. 
Tried setting 1.2.1 in step #2 which did not work because external metastore 
was older version. 
Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which 
worked. The reason for this is because spark uses a Hive config property here 
https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500
 that was introduced only in Hive 1.2.0 as part of 
https://issues.apache.org/jira/browse/HIVE-9508 



was (Author: sunil.rangwani):
[~srowen] Please don't close with a comment "should work" without any helpful 
contribution. 
Let me describe the issue again:  
1. Spark was built with -PHive where it used Hive version 1.2.1 
2. config properties for {{spark.sql.hive.metastore.jars}} and 
{{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as 
described in the documentation
3. Tried using spark sql by pointing to a remote external metastore service. 
Tried setting 1.2.1 in step #2 which did not work because external metastore 
was older version. 
Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which 
worked. The reason for this is because spark uses a Hive config property here 
https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500
 that was introduced only in Hive 1.2.0 as part of 
https://issues.apache.org/jira/browse/HIVE-9508 


> Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not 
> backwards compatible with earlier version
> ---
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit

[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955294#comment-15955294
 ] 

Sean Owen commented on SPARK-14492:
---

This isn't even consistent with the title of this JIRA.
How would updating your external Hive metastore affect the code that Spark uses 
at runtime?

> Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not 
> backwards compatible with earlier version
> ---
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955284#comment-15955284
 ] 

Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 3:58 PM:


[~srowen] Please don't close with a comment "should work" without any helpful 
contribution. 
Let me describe the issue again:  
1. Spark was built with -PHive where it used Hive version 1.2.1 
2. config properties for {{spark.sql.hive.metastore.jars}} and 
{{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as 
described in the documentation
3. Tried using spark sql by pointing to a remote external metastore service. 
Tried setting 1.2.1 in step #2 which did not work because external metastore 
was older version. 
Upgraded external metastore service and database to version 1.2.0 
{color:red}*(not 1.2.1)*{color} which worked. The reason for this is because 
spark uses a Hive config property here 
https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500
 that was introduced only in Hive 1.2.0 as part of 
https://issues.apache.org/jira/browse/HIVE-9508 



was (Author: sunil.rangwani):
[~srowen] Please don't close with a comment "should work" without any helpful 
contribution. 
Let me describe the issue again:  
1. Spark was built with -PHive where it used Hive version 1.2.1 
2. config properties for {{spark.sql.hive.metastore.jars}} and 
{{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as 
described in the documentation
3. Tried using spark sql by pointing to a remote external metastore service. 
Tried setting 1.2.1 in step #2 which did not work because external metastore 
was older version. 
Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which 
worked. The reason for this is because spark uses a Hive config property here 
https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500
 that was introduced only in Hive 1.2.0 as part of 
https://issues.apache.org/jira/browse/HIVE-9508 


> Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not 
> backwards compatible with earlier version
> ---
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.a

[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955315#comment-15955315
 ] 

Sunil Rangwani commented on SPARK-14492:


{quote} How would updating your external Hive metastore affect the code that 
Spark uses at runtime? {quote}
It can't. It uses the jars specified in spark.sql.hive.metastore.jars property
{quote}This isn't even consistent with the title of this JIRA.{quote}
I will change the title to make it a better reflection of the actual problem. 

> Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not 
> backwards compatible with earlier version
> ---
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955317#comment-15955317
 ] 

Sunil Rangwani commented on SPARK-14492:


Appears to be a problem with version 2.1.0 too as reported by another user

> Spark SQL 1.6.0 does not work with external Hive metastore version lower than 
> 1.2.0; its not backwards compatible with earlier version
> --
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Rangwani updated SPARK-14492:
---
Summary: Spark SQL 1.6.0 does not work with external Hive metastore version 
lower than 1.2.0; its not backwards compatible with earlier version  (was: 
Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not 
backwards compatible with earlier version)

> Spark SQL 1.6.0 does not work with external Hive metastore version lower than 
> 1.2.0; its not backwards compatible with earlier version
> --
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955315#comment-15955315
 ] 

Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 4:11 PM:


{quote} How would updating your external Hive metastore affect the code that 
Spark uses at runtime? {quote}
It can't. It uses the jars specified in spark.sql.hive.metastore.jars property
{quote}This isn't even consistent with the title of this JIRA.{quote}
I have changed the title to make it a better reflection of the actual problem. 


was (Author: sunil.rangwani):
{quote} How would updating your external Hive metastore affect the code that 
Spark uses at runtime? {quote}
It can't. It uses the jars specified in spark.sql.hive.metastore.jars property
{quote}This isn't even consistent with the title of this JIRA.{quote}
I will change the title to make it a better reflection of the actual problem. 

> Spark SQL 1.6.0 does not work with external Hive metastore version lower than 
> 1.2.0; its not backwards compatible with earlier version
> --
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20204) remove SimpleCatalystConf and CatalystConf type alias

2017-04-04 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-20204:

Summary: remove SimpleCatalystConf and CatalystConf type alias  (was: 
separate SQLConf into catalyst confs and sql confs)

> remove SimpleCatalystConf and CatalystConf type alias
> -
>
> Key: SPARK-20204
> URL: https://issues.apache.org/jira/browse/SPARK-20204
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20026) Document R GLM Tweedie family support in programming guide and code example

2017-04-04 Thread Wayne Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955342#comment-15955342
 ] 

Wayne Zhang commented on SPARK-20026:
-

[~felixcheung] Yes, I will work on this. Thanks. 

> Document R GLM Tweedie family support in programming guide and code example
> ---
>
> Key: SPARK-20026
> URL: https://issues.apache.org/jira/browse/SPARK-20026
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955398#comment-15955398
 ] 

Sean Owen commented on SPARK-14492:
---

If this were true, that it doesn't work with 1.2.1 because of a 
NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right?
The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 Yet you 
say 1.2.0 works while 1.2.1 doesn't.
However given this detail I _do_ agree with you that it looks like there's a 
real problem here, though I'd imagine it manifests with 1.1.x and earlier. Is 
that what you meant? it's what your JIRA says but not your last comment.

> Spark SQL 1.6.0 does not work with external Hive metastore version lower than 
> 1.2.0; its not backwards compatible with earlier version
> --
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20212) UDFs with Option[Primitive Type] don't work as expected

2017-04-04 Thread Marius Feteanu (JIRA)

Marius Feteanu created SPARK-20212:
--

 Summary: UDFs with Option[Primitive Type] don't work as expected
 Key: SPARK-20212
 URL: https://issues.apache.org/jira/browse/SPARK-20212
 Project: Spark
  Issue Type: Bug
  Components: Optimizer
Affects Versions: 2.1.0
Reporter: Marius Feteanu
Priority: Minor


The documenation for ScalaUDF says:

{code:none}
Note that if you use primitive parameters, you are not able to check if it is
null or not, and the UDF will return null for you if the primitive input is
null. Use boxed type or [[Option]] if you wanna do the null-handling yourself.
{code}

This works with boxed types:

{code:none}
import org.apache.spark.sql.functions.{col, udf}
import spark.implicits._

def is_null_box(x:java.lang.Long):String = {
x match {
case _:java.lang.Long => "Yep"
case null => "No man"
}
}
val is_null_box_udf = udf(is_null_box _)
val sample = (1L to 5L).toList.map(x=>new 
java.lang.Long(x))++List[java.lang.Long](null, null)
val df = sample.toDF("col1")
df.select(is_null_box_udf(col("col1"))).show(10)
{code}

But does not work with Option\[Long\] as expected:

{code:none}
import org.apache.spark.sql.functions.{col, udf}
import spark.implicits._

def is_null_opt(x:Option[Long]):String = {
x match {
case Some(_:Long) => "Yep"
case None => "No man"
}
}
val is_null_opt_udf = udf(is_null_opt _)
val sample = (1L to 5L)
// This does not help: val sample = (1L to 5L).map(Some(_)).toList
val df = sample.toDF("col1")
df.select(is_null_opt_udf(col("col1"))).show(10)
{code}

That throws:
{code:none}
Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to 
scala.Option
  at $anonfun$1.apply(:56)
  at 
org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:89)
  at 
org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:88)
  at 
org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1069)
{code}

The current workaround is to use boxed types but it makes for code that looks 
funny.

If you just use Long instead of boxing the code may break in subtle ways (i.e.  
it does not fail it returns null). That's documented but easy to miss (i.e. not 
part of the bug but if someone "corrects" boxed functions to use primitive 
types then they might get surprising results):
{code:none}
import org.apache.spark.sql.functions.{col, udf, expr}
import spark.implicits._

def is_null_opt(x:Long):String = {
Option(x) match {
case Some(_:Long) => "Yep"
case None => "No man"
}
}
val is_null_opt_udf = udf(is_null_opt _)
val sample = (1L to 5L)
val df = sample.toDF("col3").select(expr("CASE WHEN col3=2 THEN NULL ELSE col3 
END").alias("col3"))
df.printSchema
df.select(is_null_opt_udf(col("col3"))).show(10)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20161) Default log4j properties file should print thread-id in ConversionPattern

2017-04-04 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20161.
---
  Resolution: Won't Fix
Target Version/s:   (was: 2.2.0)

> Default log4j properties file should print thread-id in ConversionPattern
> -
>
> Key: SPARK-20161
> URL: https://issues.apache.org/jira/browse/SPARK-20161
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, YARN
>Affects Versions: 2.1.0
>Reporter: Sahil Takiar
>
> The default log4j file in {{spark/conf/log4j.properties.template}} doesn't 
> display the thread-id when printing out the logs. It would be very useful to 
> add this, especially for YARN. Currently, logs from all the different threads 
> in a single executor are sent to the same log file. This makes debugging 
> difficult as it is hard to filter out what logs come from what thread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955510#comment-15955510
 ] 

Sunil Rangwani commented on SPARK-14492:


{quote}If this were true, that it doesn't work with 1.2.1 because of a 
NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? 
{quote}
Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used 
an external metastore of an older version, that didnt work. It did not give me 
a different error I don't remember but it was not NoSuchFieldError  
{quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 
Yet you say 1.2.0 works while 1.2.1 doesn't.{quote}
All versions upwards of 1.2.0 does work fine. The minimum version of external 
metastore that works is 1.2.0 




> Spark SQL 1.6.0 does not work with external Hive metastore version lower than 
> 1.2.0; its not backwards compatible with earlier version
> --
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955510#comment-15955510
 ] 

Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 5:55 PM:


{quote}If this were true, that it doesn't work with 1.2.1 because of a 
NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? 
{quote}
Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used 
an external metastore of an older version, that didnt work. It gives a 
different error I don't remember but it was not NoSuchFieldError  
{quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 
Yet you say 1.2.0 works while 1.2.1 doesn't.{quote}
All versions upwards of 1.2.0 does work fine. The minimum version of external 
metastore that works is 1.2.0 





was (Author: sunil.rangwani):
{quote}If this were true, that it doesn't work with 1.2.1 because of a 
NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? 
{quote}
Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used 
an external metastore of an older version, that didnt work. It did not give me 
a different error I don't remember but it was not NoSuchFieldError  
{quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 
Yet you say 1.2.0 works while 1.2.1 doesn't.{quote}
All versions upwards of 1.2.0 does work fine. The minimum version of external 
metastore that works is 1.2.0 




> Spark SQL 1.6.0 does not work with external Hive metastore version lower than 
> 1.2.0; its not backwards compatible with earlier version
> --
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe,

[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955510#comment-15955510
 ] 

Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 5:58 PM:


{quote}If this were true, that it doesn't work with 1.2.1 because of a 
NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? 
{quote}
Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used 
an external metastore of an older version, that didnt work. It gives a 
different error I don't remember but it was not NoSuchFieldError  
{quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 
Yet you say 1.2.0 works while 1.2.1 doesn't.{quote}
All versions upwards of 1.2.0 does work fine. The minimum version of external 
metastore that works is 1.2.0. All versions upto 1.2.0 give a NoSuchFieldError 
for different fields.   


was (Author: sunil.rangwani):
{quote}If this were true, that it doesn't work with 1.2.1 because of a 
NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? 
{quote}
Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used 
an external metastore of an older version, that didnt work. It gives a 
different error I don't remember but it was not NoSuchFieldError  
{quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 
Yet you say 1.2.0 works while 1.2.1 doesn't.{quote}
All versions upwards of 1.2.0 does work fine. The minimum version of external 
metastore that works is 1.2.0 




> Spark SQL 1.6.0 does not work with external Hive metastore version lower than 
> 1.2.0; its not backwards compatible with earlier version
> --
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-

[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955510#comment-15955510
 ] 

Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 6:00 PM:


{quote}If this were true, that it doesn't work with 1.2.1 because of a 
NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? 
{quote}
Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used 
an external metastore of an older version, that didnt work. It gives a 
different error I don't remember but it was not NoSuchFieldError  
{quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 
Yet you say 1.2.0 works while 1.2.1 doesn't.{quote}
All versions upwards of 1.2.0 does work fine. The minimum version of external 
metastore that works is 1.2.0 (I was just highlighting I didn't have to upgrade 
upto 1.2.1) All versions upto 1.2.0 give a NoSuchFieldError for different 
fields.   


was (Author: sunil.rangwani):
{quote}If this were true, that it doesn't work with 1.2.1 because of a 
NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? 
{quote}
Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used 
an external metastore of an older version, that didnt work. It gives a 
different error I don't remember but it was not NoSuchFieldError  
{quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 
Yet you say 1.2.0 works while 1.2.1 doesn't.{quote}
All versions upwards of 1.2.0 does work fine. The minimum version of external 
metastore that works is 1.2.0. All versions upto 1.2.0 give a NoSuchFieldError 
for different fields.   

> Spark SQL 1.6.0 does not work with external Hive metastore version lower than 
> 1.2.0; its not backwards compatible with earlier version
> --
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy

[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955510#comment-15955510
 ] 

Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 6:06 PM:


{quote}If this were true, that it doesn't work with 1.2.1 because of a 
NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? 
{quote}
Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used 
an external metastore of an older version, that didnt work. It gives a 
different error I don't remember but it was not NoSuchFieldError  
{quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 
Yet you say 1.2.0 works while 1.2.1 doesn't.{quote}
All versions upwards of 1.2.0 do work fine. The minimum version of external 
metastore that works is 1.2.0 (I was just highlighting I didn't have to upgrade 
upto 1.2.1) All versions upto 1.2.0 give a NoSuchFieldError for different 
fields.   


was (Author: sunil.rangwani):
{quote}If this were true, that it doesn't work with 1.2.1 because of a 
NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? 
{quote}
Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used 
an external metastore of an older version, that didnt work. It gives a 
different error I don't remember but it was not NoSuchFieldError  
{quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 
Yet you say 1.2.0 works while 1.2.1 doesn't.{quote}
All versions upwards of 1.2.0 does work fine. The minimum version of external 
metastore that works is 1.2.0 (I was just highlighting I didn't have to upgrade 
upto 1.2.1) All versions upto 1.2.0 give a NoSuchFieldError for different 
fields.   

> Spark SQL 1.6.0 does not work with external Hive metastore version lower than 
> 1.2.0; its not backwards compatible with earlier version
> --
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.ma

[jira] [Comment Edited] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode

2017-04-04 Thread Ian Hummel (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955528#comment-15955528
 ] 

Ian Hummel edited comment on SPARK-5158 at 4/4/17 6:07 PM:
---

At Bloomberg we've been working on a solution to this issue so we can access 
kerberized HDFS clusters from standalone Spark installations we run on our 
internal cloud infrastructure.

Folks who are interested can try out a patch at 
https://github.com/themodernlife/spark/tree/spark-5158.  It extends standalone 
mode to support configuration related to {{\-\-principal}} and {{\-\-keytab}}.

The main changes are
- Refactor {{ConfigurableCredentialManager}} and related 
{{CredentialProviders}} so that they are no longer tied to YARN
- Setup credential renewal/updating from within the 
{{StandaloneSchedulerBackend}}
- Ensure executors/drivers are able to find initial tokens for contacting HDFS 
and renew them at regular intervals

The implementation does basically the same thing as the YARN backend.  The 
keytab is copied to driver/executors through an environment variable in the 
{{ApplicationDescription}}.  I might be wrong, but I'm assuming proper 
{{spark.authenticate}} setup would ensure it's encrypted over-the-wire (can 
anyone confirm?).  Credentials on the executors and the driver (cluster mode) 
are written to disk as whatever user the Spark daemon runs as.  Open to 
suggestions on whether it's worth tightening that up.

Would appreciate any feedback from the community.


was (Author: themodernlife):
At Bloomberg we've been working on a solution to this issue so we can access 
kerberized HDFS clusters from standalone Spark installations we run on our 
internal cloud infrastructure.

Folks who are interested can try out a patch at 
https://github.com/themodernlife/spark/tree/spark-5158.  It extends standalone 
mode to support configuration related to {{--principal}} and {{--keytab}}.

The main changes are
- Refactor {{ConfigurableCredentialManager}} and related 
{{CredentialProviders}} so that they are no longer tied to YARN
- Setup credential renewal/updating from within the 
{{StandaloneSchedulerBackend}}
- Ensure executors/drivers are able to find initial tokens for contacting HDFS 
and renew them at regular intervals

The implementation does basically the same thing as the YARN backend.  The 
keytab is copied to driver/executors through an environment variable in the 
{{ApplicationDescription}}.  I might be wrong, but I'm assuming proper 
{{spark.authenticate}} setup would ensure it's encrypted over-the-wire (can 
anyone confirm?).  Credentials on the executors and the driver (cluster mode) 
are written to disk as whatever user the Spark daemon runs as.  Open to 
suggestions on whether it's worth tightening that up.

Would appreciate any feedback from the community.

> Allow for keytab-based HDFS security in Standalone mode
> ---
>
> Key: SPARK-5158
> URL: https://issues.apache.org/jira/browse/SPARK-5158
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Matthew Cheah
>Priority: Critical
>
> There have been a handful of patches for allowing access to Kerberized HDFS 
> clusters in standalone mode. The main reason we haven't accepted these 
> patches have been that they rely on insecure distribution of token files from 
> the driver to the other components.
> As a simpler solution, I wonder if we should just provide a way to have the 
> Spark driver and executors independently log in and acquire credentials using 
> a keytab. This would work for users who have a dedicated, single-tenant, 
> Spark clusters (i.e. they are willing to have a keytab on every machine 
> running Spark for their application). It wouldn't address all possible 
> deployment scenarios, but if it's simple I think it's worth considering.
> This would also work for Spark streaming jobs, which often run on dedicated 
> hardware since they are long-running services.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode

2017-04-04 Thread Ian Hummel (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955528#comment-15955528
 ] 

Ian Hummel commented on SPARK-5158:
---

At Bloomberg we've been working on a solution to this issue so we can access 
kerberized HDFS clusters from standalone Spark installations we run on our 
internal cloud infrastructure.

Folks who are interested can try out a patch at 
https://github.com/themodernlife/spark/tree/spark-5158.  It extends standalone 
mode to support configuration related to {{--principal}} and {{--keytab}}.

The main changes are
- Refactor {{ConfigurableCredentialManager}} and related 
{{CredentialProviders}} so that they are no longer tied to YARN
- Setup credential renewal/updating from within the 
{{StandaloneSchedulerBackend}}
- Ensure executors/drivers are able to find initial tokens for contacting HDFS 
and renew them at regular intervals

The implementation does basically the same thing as the YARN backend.  The 
keytab is copied to driver/executors through an environment variable in the 
{{ApplicationDescription}}.  I might be wrong, but I'm assuming proper 
{{spark.authenticate}} setup would ensure it's encrypted over-the-wire (can 
anyone confirm?).  Credentials on the executors and the driver (cluster mode) 
are written to disk as whatever user the Spark daemon runs as.  Open to 
suggestions on whether it's worth tightening that up.

Would appreciate any feedback from the community.

> Allow for keytab-based HDFS security in Standalone mode
> ---
>
> Key: SPARK-5158
> URL: https://issues.apache.org/jira/browse/SPARK-5158
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Matthew Cheah
>Priority: Critical
>
> There have been a handful of patches for allowing access to Kerberized HDFS 
> clusters in standalone mode. The main reason we haven't accepted these 
> patches have been that they rely on insecure distribution of token files from 
> the driver to the other components.
> As a simpler solution, I wonder if we should just provide a way to have the 
> Spark driver and executors independently log in and acquire credentials using 
> a keytab. This would work for users who have a dedicated, single-tenant, 
> Spark clusters (i.e. they are willing to have a keytab on every machine 
> running Spark for their application). It wouldn't address all possible 
> deployment scenarios, but if it's simple I think it's worth considering.
> This would also work for Spark streaming jobs, which often run on dedicated 
> hardware since they are long-running services.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10364) Support Parquet logical type TIMESTAMP_MILLIS

2017-04-04 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-10364:
---

Assignee: Dilip Biswal

> Support Parquet logical type TIMESTAMP_MILLIS
> -
>
> Key: SPARK-10364
> URL: https://issues.apache.org/jira/browse/SPARK-10364
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Cheng Lian
>Assignee: Dilip Biswal
> Fix For: 2.2.0
>
>
> The {{TimestampType}} in Spark SQL is of microsecond precision. Ideally, we 
> should convert Spark SQL timestamp values into Parquet {{TIMESTAMP_MICROS}}. 
> But unfortunately parquet-mr hasn't supported it yet.
> For the read path, we should be able to read {{TIMESTAMP_MILLIS}} Parquet 
> values and pad a 0 microsecond part to read values.
> For the write path, currently we are writing timestamps as {{INT96}}, similar 
> to Impala and Hive. One alternative is that, we can have a separate SQL 
> option to let users be able to write Spark SQL timestamp values as 
> {{TIMESTAMP_MILLIS}}. Of course, in this way the microsecond part will be 
> truncated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode

2017-04-04 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955564#comment-15955564
 ] 

Apache Spark commented on SPARK-5158:
-

User 'themodernlife' has created a pull request for this issue:
https://github.com/apache/spark/pull/17530

> Allow for keytab-based HDFS security in Standalone mode
> ---
>
> Key: SPARK-5158
> URL: https://issues.apache.org/jira/browse/SPARK-5158
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Matthew Cheah
>Priority: Critical
>
> There have been a handful of patches for allowing access to Kerberized HDFS 
> clusters in standalone mode. The main reason we haven't accepted these 
> patches have been that they rely on insecure distribution of token files from 
> the driver to the other components.
> As a simpler solution, I wonder if we should just provide a way to have the 
> Spark driver and executors independently log in and acquire credentials using 
> a keytab. This would work for users who have a dedicated, single-tenant, 
> Spark clusters (i.e. they are willing to have a keytab on every machine 
> running Spark for their application). It wouldn't address all possible 
> deployment scenarios, but if it's simple I think it's worth considering.
> This would also work for Spark streaming jobs, which often run on dedicated 
> hardware since they are long-running services.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20191) RackResolver not correctly being overridden in YARN tests

2017-04-04 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-20191.

   Resolution: Fixed
 Assignee: Marcelo Vanzin
Fix Version/s: 2.1.2
   2.2.0

> RackResolver not correctly being overridden in YARN tests
> -
>
> Key: SPARK-20191
> URL: https://issues.apache.org/jira/browse/SPARK-20191
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.3, 2.1.1, 2.2.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 2.2.0, 2.1.2
>
>
> YARN tests currently try to override YARN's RackResolver, but that class 
> self-initializes the first time it's called, storing state in static 
> variables and ignoring any further config params that might override the 
> initial behavior.
> So we need a better solution for Spark tests, so that tests such as 
> {{LocalityPlacementStrategySuite}} don't flood the DNS server with requests 
> (making the test really slow in certain environments).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20204) remove SimpleCatalystConf and CatalystConf type alias

2017-04-04 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-20204.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

> remove SimpleCatalystConf and CatalystConf type alias
> -
>
> Key: SPARK-20204
> URL: https://issues.apache.org/jira/browse/SPARK-20204
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20153) Support Multiple aws credentials in order to access multiple Hive on S3 table in spark application

2017-04-04 Thread Franck Tago (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955695#comment-15955695
 ] 

Franck Tago commented on SPARK-20153:
-

oh I would definitely not consider including the key into the URL . It is a 
gigantic security hole in my opinion. Moreover consider that I am dealing with 
Hive on S3 where the uri is part of the table metadata. How would that work in 
this case.

Is there a way to encode the  accesId and secret key before calling 

Does spark provide anyway of masking or hiding the accessId and secretKey?

> Support Multiple aws credentials in order to access multiple Hive on S3 table 
> in spark application 
> ---
>
> Key: SPARK-20153
> URL: https://issues.apache.org/jira/browse/SPARK-20153
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Franck Tago
>Priority: Minor
>
> I need to access multiple hive tables in my spark application where each hive 
> table is 
> 1- an external table with data sitting on S3
> 2- each table is own by a different AWS user so I need to provide different 
> AWS credentials. 
> I am familiar with setting the aws credentials in the hadoop configuration 
> object but that does not really help me because I can only set one pair of 
> (fs.s3a.awsAccessKeyId , fs.s3a.awsSecretAccessKey )
> From my research , there is no easy or elegant way to do this in spark .
> Why is that ?  
> How do I address this use case?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan

2017-04-04 Thread yuhao yang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955705#comment-15955705
 ] 

yuhao yang commented on SPARK-20203:


[~Syrux] Since you got some experiences using the PrefixSpan, I'd like to have 
your input (or better contribution) in 
https://issues.apache.org/jira/browse/SPARK-20114 . 

> Change default maxPatternLength value to Int.MaxValue in PrefixSpan
> ---
>
> Key: SPARK-20203
> URL: https://issues.apache.org/jira/browse/SPARK-20203
> Project: Spark
>  Issue Type: Wish
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Trivial
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I think changing the default value to Int.MaxValue would be more user 
> friendly. At least for new users.
> Personally, when I run an algorithm, I expect it to find all solution by 
> default. And a limited number of them, when I set the parameters to do so.
> The current implementation limit the length of solution patterns to 10.
> Thus preventing all solution to be printed when running slightly large 
> datasets.
> I feel like that should be changed, but since this would change the default 
> behavior of PrefixSpan. I think asking for the communities opinion should 
> come first. So, what do you think ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17495) Hive hash implementation

2017-04-04 Thread Dominic Ricard (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955714#comment-15955714
 ] 

Dominic Ricard commented on SPARK-17495:


[~tejasp] We use murmur3 hash internally in some of our data pipelines, 
non-SQL, and I would like to know if the goal of this task to expose a new UDF 
(ex: murmur3()), similar to md5() and hash()?

I believe that would be the best approach to preserve compatibility with 
previously generated data and queries using hash().



> Hive hash implementation
> 
>
> Key: SPARK-17495
> URL: https://issues.apache.org/jira/browse/SPARK-17495
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Tejas Patil
>Assignee: Tejas Patil
>Priority: Minor
> Fix For: 2.2.0
>
>
> Spark internally uses Murmur3Hash for partitioning. This is different from 
> the one used by Hive. For queries which use bucketing this leads to different 
> results if one tries the same query on both engines. For us, we want users to 
> have backward compatibility to that one can switch parts of applications 
> across the engines without observing regressions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2017-04-04 Thread Prasanth (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955747#comment-15955747
 ] 

Prasanth commented on SPARK-18737:
--

We are using 2.0.1. Can you tell us how to "disable the Kryo auto-pick for 
streaming from the Java API" as a workaround?

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2.0.1, we see the Kyro deserialization exception and over time the Spark 
> streaming job stops processing since too many tasks failed.
> Our action was to use conf.set("spark.serializer", 
> "org.apache.spark.serializer.JavaSerializer") and to disable Kryo class 
> registration with conf.set("spark.kryo.registrationRequired", false). We hope 
> to identify the root cause of the exception. 
> However, setting the serializer to JavaSerializer is oviously ignored by the 
> Spark-internals. Despite the setting we still see the exception printed in 
> the log and tasks fail. The occurence seems to be non-deterministic, but to 
> become more frequent over time.
> Several questions we could not answer during our troubleshooting:
> 1. How can the debug log for Kryo be enabled? -- We tried following the 
> minilog documentation, but no output can be found.
> 2. Is the serializer setting effective for Spark internal serializat

[jira] [Created] (SPARK-20213) DataFrameWriter operations do not show up in SQL tab

2017-04-04 Thread Ryan Blue (JIRA)

Ryan Blue created SPARK-20213:
-

 Summary: DataFrameWriter operations do not show up in SQL tab
 Key: SPARK-20213
 URL: https://issues.apache.org/jira/browse/SPARK-20213
 Project: Spark
  Issue Type: Bug
  Components: SQL, Web UI
Affects Versions: 2.1.0, 2.0.2
Reporter: Ryan Blue


In 1.6.1, {{DataFrame}} writes started using {{DataFrameWriter}} actions like 
{{insertInto}} would show up in the SQL tab. In 2.0.0 and later, they no longer 
do. The problem is that 2.0.0 and later no longer wrap execution with 
{{SQLExecution.withNewExecutionId}}, which emits 
{{SparkListenerSQLExecutionStart}}.

Here are the relevant parts of the stack traces:
{code:title=Spark 1.6.1}
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
org.apache.spark.sql.execution.QueryExecution$$anonfun$toRdd$1.apply(QueryExecution.scala:56)
org.apache.spark.sql.execution.QueryExecution$$anonfun$toRdd$1.apply(QueryExecution.scala:56)
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53)
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:56)
 => holding 
Monitor(org.apache.spark.sql.hive.HiveContext$QueryExecution@424773807})
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:196)
{code}

{code:title=Spark 2.0.0}
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
 => holding Monitor(org.apache.spark.sql.execution.QueryExecution@490977924})
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:301)
{code}

I think this was introduced by 
[54d23599|https://github.com/apache/spark/commit/54d23599]. The fix should be 
to add withNewExecutionId to 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L610



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20214) pyspark.mllib SciPyTests test_serialize

2017-04-04 Thread Joseph K. Bradley (JIRA)

Joseph K. Bradley created SPARK-20214:
-

 Summary: pyspark.mllib SciPyTests test_serialize
 Key: SPARK-20214
 URL: https://issues.apache.org/jira/browse/SPARK-20214
 Project: Spark
  Issue Type: Bug
  Components: ML, MLlib, PySpark, Tests
Affects Versions: 2.0.2, 2.1.1, 2.2.0
Reporter: Joseph K. Bradley


I've seen a few failures of this line: 
https://github.com/apache/spark/blame/402bf2a50ddd4039ff9f376b641bd18fffa54171/python/pyspark/mllib/tests.py#L847

It converts a scipy.sparse.lil_matrix to a dok_matrix and then to a 
pyspark.mllib.linalg.Vector.  The failure happens in the conversion to a vector 
and indicates that the dok_matrix is not returning its values in sorted order. 
(Actually, the failure is in _convert_to_vector, which converts the dok_matrix 
to a csc_matrix and then passes the CSC data to the MLlib Vector constructor.) 
Here's the stack trace:
{code}
Traceback (most recent call last):
  File "/home/jenkins/workspace/python/pyspark/mllib/tests.py", line 847, in 
test_serialize
self.assertEqual(sv, _convert_to_vector(lil.todok()))
  File "/home/jenkins/workspace/python/pyspark/mllib/linalg/__init__.py", line 
78, in _convert_to_vector
return SparseVector(l.shape[0], csc.indices, csc.data)
  File "/home/jenkins/workspace/python/pyspark/mllib/linalg/__init__.py", line 
556, in __init__
% (self.indices[i], self.indices[i + 1]))
TypeError: Indices 3 and 1 are not strictly increasing
{code}

This seems like a bug in _convert_to_vector, where we really should check 
{{csc_matrix.has_sorted_indices}} first.

I haven't seen this bug in pyspark.ml.linalg, but it probably exists there too.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20153) Support Multiple aws credentials in order to access multiple Hive on S3 table in spark application

2017-04-04 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955956#comment-15955956
 ] 

Steve Loughran commented on SPARK-20153:


I'm glad we are both in agreement about not using secrets in URLs.

I'm afraid then, there's not much that can be done without upgrading to Hadoop 
2.8.x JARs. You'll get a lot of other S3A speedups too, so it's worth upgrading 
for S3 IO performance as well as security.

> Support Multiple aws credentials in order to access multiple Hive on S3 table 
> in spark application 
> ---
>
> Key: SPARK-20153
> URL: https://issues.apache.org/jira/browse/SPARK-20153
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Franck Tago
>Priority: Minor
>
> I need to access multiple hive tables in my spark application where each hive 
> table is 
> 1- an external table with data sitting on S3
> 2- each table is own by a different AWS user so I need to provide different 
> AWS credentials. 
> I am familiar with setting the aws credentials in the hadoop configuration 
> object but that does not really help me because I can only set one pair of 
> (fs.s3a.awsAccessKeyId , fs.s3a.awsSecretAccessKey )
> From my research , there is no easy or elegant way to do this in spark .
> Why is that ?  
> How do I address this use case?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20215) ReuseExchange is boken in SparkSQL

2017-04-04 Thread Zhan Zhang (JIRA)

Zhan Zhang created SPARK-20215:
--

 Summary: ReuseExchange is boken in SparkSQL
 Key: SPARK-20215
 URL: https://issues.apache.org/jira/browse/SPARK-20215
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: Zhan Zhang
Priority: Minor


Currently if we have query like: A join B Union A join C... with the same join 
key. Table A will be scanned multiple times in sql. It is because the 
megastoreRelation are not shared by two joins, and ExprId is different.  
canonicalized in Expression will not be able to unify them and two Exchange 
will not compatible and cannot be reused.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version

2017-04-04 Thread Sunil Rangwani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955510#comment-15955510
 ] 

Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 10:42 PM:
-

{quote}If this were true, that it doesn't work with 1.2.1 because of a 
NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? 
{quote}
To clarify, I meant that if I left the value of the property as 1.2.1 but used 
an external metastore of an older version, that doesn't work. It gives a 
different error but not NoSuchFieldError  
{quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 
Yet you say 1.2.0 works while 1.2.1 doesn't.{quote}
All versions upwards of 1.2.0 do work fine. The minimum version of external 
metastore that works is 1.2.0 (I was just highlighting I didn't have to upgrade 
upto 1.2.1) All versions upto 1.2.0 give a NoSuchFieldError for different 
fields.   


was (Author: sunil.rangwani):
{quote}If this were true, that it doesn't work with 1.2.1 because of a 
NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? 
{quote}
Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used 
an external metastore of an older version, that didnt work. It gives a 
different error I don't remember but it was not NoSuchFieldError  
{quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 
Yet you say 1.2.0 works while 1.2.1 doesn't.{quote}
All versions upwards of 1.2.0 do work fine. The minimum version of external 
metastore that works is 1.2.0 (I was just highlighting I didn't have to upgrade 
upto 1.2.1) All versions upto 1.2.0 give a NoSuchFieldError for different 
fields.   

> Spark SQL 1.6.0 does not work with external Hive metastore version lower than 
> 1.2.0; its not backwards compatible with earlier version
> --
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:12

[jira] [Created] (SPARK-20216) Install pandoc on machine(s) used for packaging

2017-04-04 Thread holdenk (JIRA)

holdenk created SPARK-20216:
---

 Summary: Install pandoc on machine(s) used for packaging
 Key: SPARK-20216
 URL: https://issues.apache.org/jira/browse/SPARK-20216
 Project: Spark
  Issue Type: Bug
  Components: Project Infra, PySpark
Affects Versions: 2.1.1, 2.2.0
Reporter: holdenk
Priority: Blocker


For Python packaging having pandoc is required to have a reasonable package doc 
string. Which ever machine(s) are used for packaging should have both pandoc 
and pypandoc installed on them.
cc [~joshrosen] who I know was doing something related to this



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-04-04 Thread Li Jin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956030#comment-15956030
 ] 

Li Jin commented on SPARK-20144:


> When you save the sorted data into Parquet, only the data in individual 
> Parquet file can maintain the data ordering
This is something changed in Spark 2 as well? write.parquet doesn't write 
parquet files in the order of partitions any more?

> spark.read.parquet no long maintains ordering of the data
> -
>
> Key: SPARK-20144
> URL: https://issues.apache.org/jira/browse/SPARK-20144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Li Jin
>
> Hi, We are trying to upgrade Spark from 1.6.3 to 2.0.2. One issue we found is 
> when we read parquet files in 2.0.2, the ordering of rows in the resulting 
> dataframe is not the same as the ordering of rows in the dataframe that the 
> parquet file was reproduced with. 
> This is because FileSourceStrategy.scala combines the parquet files into 
> fewer partitions and also reordered them. This breaks our workflows because 
> they assume the ordering of the data. 
> Is this considered a bug? Also FileSourceStrategy and FileSourceScanExec 
> changed quite a bit from 2.0.2 to 2.1, so not sure if this is an issue with 
> 2.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20216) Install pandoc on machine(s) used for packaging

2017-04-04 Thread Michael Armbrust (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956045#comment-15956045
 ] 

Michael Armbrust commented on SPARK-20216:
--

I think it all runs on 
https://amplab.cs.berkeley.edu/jenkins/computer/amp-jenkins-worker-01/

> Install pandoc on machine(s) used for packaging
> ---
>
> Key: SPARK-20216
> URL: https://issues.apache.org/jira/browse/SPARK-20216
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra, PySpark
>Affects Versions: 2.1.1, 2.2.0
>Reporter: holdenk
>Priority: Blocker
>
> For Python packaging having pandoc is required to have a reasonable package 
> doc string. Which ever machine(s) are used for packaging should have both 
> pandoc and pypandoc installed on them.
> cc [~joshrosen] who I know was doing something related to this



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

2017-04-04 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian reassigned SPARK-19716:
--

Assignee: Wenchen Fan

> Dataset should allow by-name resolution for struct type elements in array
> -
>
> Key: SPARK-19716
> URL: https://issues.apache.org/jira/browse/SPARK-19716
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>
> if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it 
> to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
> extract the `a` and `c` columns to build the Data.
> However, if the struct is inside array, e.g. schema is {{arr: array int, b: int, c: int>>}}, and we wanna convert it to Dataset with {{case class 
> ComplexData(arr: Seq[Data])}}, we will fail. The reason is, to allow 
> compatible types, e.g. convert {{a: int}} to {{case class A(a: Long)}}, we 
> will add cast for each field, except struct type field, because struct type 
> is flexible, the number of columns can mismatch. We should probably also skip 
> cast for array and map type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

2017-04-04 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-19716.

   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 17398
[https://github.com/apache/spark/pull/17398]

> Dataset should allow by-name resolution for struct type elements in array
> -
>
> Key: SPARK-19716
> URL: https://issues.apache.org/jira/browse/SPARK-19716
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.3.0
>
>
> if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it 
> to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
> extract the `a` and `c` columns to build the Data.
> However, if the struct is inside array, e.g. schema is {{arr: array int, b: int, c: int>>}}, and we wanna convert it to Dataset with {{case class 
> ComplexData(arr: Seq[Data])}}, we will fail. The reason is, to allow 
> compatible types, e.g. convert {{a: int}} to {{case class A(a: Long)}}, we 
> will add cast for each field, except struct type field, because struct type 
> is flexible, the number of columns can mismatch. We should probably also skip 
> cast for array and map type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

2017-04-04 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-19716:
---
Fix Version/s: (was: 2.3.0)
   2.2.0

> Dataset should allow by-name resolution for struct type elements in array
> -
>
> Key: SPARK-19716
> URL: https://issues.apache.org/jira/browse/SPARK-19716
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.2.0
>
>
> if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it 
> to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
> extract the `a` and `c` columns to build the Data.
> However, if the struct is inside array, e.g. schema is {{arr: array int, b: int, c: int>>}}, and we wanna convert it to Dataset with {{case class 
> ComplexData(arr: Seq[Data])}}, we will fail. The reason is, to allow 
> compatible types, e.g. convert {{a: int}} to {{case class A(a: Long)}}, we 
> will add cast for each field, except struct type field, because struct type 
> is flexible, the number of columns can mismatch. We should probably also skip 
> cast for array and map type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception

2017-04-04 Thread Eric Liang (JIRA)

Eric Liang created SPARK-20217:
--

 Summary: Executor should not fail stage if killed task throws 
non-interrupted exception
 Key: SPARK-20217
 URL: https://issues.apache.org/jira/browse/SPARK-20217
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Eric Liang


This is reproducible as follows. Run the following, and then use 
SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will 
fail since we threw a RuntimeException instead of InterruptedException.

We should probably unconditionally return TaskKilled instead of TaskFailed if 
the task was killed by the driver, regardless of the actual exception thrown.

{code}
spark.range(100).repartition(100).foreach { i =>
  try {
Thread.sleep(1000)
  } catch {
case t: InterruptedException =>
  throw new RuntimeException(t)
  }
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception

2017-04-04 Thread Eric Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Liang updated SPARK-20217:
---
Description: 
This is reproducible as follows. Run the following, and then use 
SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will 
fail since we threw a RuntimeException instead of InterruptedException.

We should probably unconditionally return TaskKilled instead of TaskFailed if 
the task was killed by the driver, regardless of the actual exception thrown.

{code}
spark.range(100).repartition(100).foreach { i =>
  try {
Thread.sleep(1000)
  } catch {
case t: InterruptedException =>
  throw new RuntimeException(t)
  }
}
{code}

Based on the code in TaskSetManager, I think this also affects kills of 
speculative tasks. However, since the number of speculated tasks is few, and 
usually you need to fail a task a few times before the stage is cancelled, 
probably no-one noticed this in production.

  was:
This is reproducible as follows. Run the following, and then use 
SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will 
fail since we threw a RuntimeException instead of InterruptedException.

We should probably unconditionally return TaskKilled instead of TaskFailed if 
the task was killed by the driver, regardless of the actual exception thrown.

{code}
spark.range(100).repartition(100).foreach { i =>
  try {
Thread.sleep(1000)
  } catch {
case t: InterruptedException =>
  throw new RuntimeException(t)
  }
}
{code}



> Executor should not fail stage if killed task throws non-interrupted exception
> --
>
> Key: SPARK-20217
> URL: https://issues.apache.org/jira/browse/SPARK-20217
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Eric Liang
>
> This is reproducible as follows. Run the following, and then use 
> SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will 
> fail since we threw a RuntimeException instead of InterruptedException.
> We should probably unconditionally return TaskKilled instead of TaskFailed if 
> the task was killed by the driver, regardless of the actual exception thrown.
> {code}
> spark.range(100).repartition(100).foreach { i =>
>   try {
> Thread.sleep(1000)
>   } catch {
> case t: InterruptedException =>
>   throw new RuntimeException(t)
>   }
> }
> {code}
> Based on the code in TaskSetManager, I think this also affects kills of 
> speculative tasks. However, since the number of speculated tasks is few, and 
> usually you need to fail a task a few times before the stage is cancelled, 
> probably no-one noticed this in production.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20183) Add outlierRatio option to testOutliersWithSmallWeights

2017-04-04 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-20183.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 17501
[https://github.com/apache/spark/pull/17501]

> Add outlierRatio option to testOutliersWithSmallWeights
> ---
>
> Key: SPARK-20183
> URL: https://issues.apache.org/jira/browse/SPARK-20183
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, Tests
>Affects Versions: 2.1.0
>Reporter: Joseph K. Bradley
>Assignee: Seth Hendrickson
> Fix For: 2.2.0
>
>
> Part 1 of parent PR: Add flexibility to testOutliersWithSmallWeights test.  
> See https://github.com/apache/spark/pull/16722 for perspective.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception

2017-04-04 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20217:


Assignee: Apache Spark

> Executor should not fail stage if killed task throws non-interrupted exception
> --
>
> Key: SPARK-20217
> URL: https://issues.apache.org/jira/browse/SPARK-20217
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Eric Liang
>Assignee: Apache Spark
>
> This is reproducible as follows. Run the following, and then use 
> SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will 
> fail since we threw a RuntimeException instead of InterruptedException.
> We should probably unconditionally return TaskKilled instead of TaskFailed if 
> the task was killed by the driver, regardless of the actual exception thrown.
> {code}
> spark.range(100).repartition(100).foreach { i =>
>   try {
> Thread.sleep(1000)
>   } catch {
> case t: InterruptedException =>
>   throw new RuntimeException(t)
>   }
> }
> {code}
> Based on the code in TaskSetManager, I think this also affects kills of 
> speculative tasks. However, since the number of speculated tasks is few, and 
> usually you need to fail a task a few times before the stage is cancelled, 
> probably no-one noticed this in production.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception

2017-04-04 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956093#comment-15956093
 ] 

Apache Spark commented on SPARK-20217:
--

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/17531

> Executor should not fail stage if killed task throws non-interrupted exception
> --
>
> Key: SPARK-20217
> URL: https://issues.apache.org/jira/browse/SPARK-20217
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Eric Liang
>
> This is reproducible as follows. Run the following, and then use 
> SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will 
> fail since we threw a RuntimeException instead of InterruptedException.
> We should probably unconditionally return TaskKilled instead of TaskFailed if 
> the task was killed by the driver, regardless of the actual exception thrown.
> {code}
> spark.range(100).repartition(100).foreach { i =>
>   try {
> Thread.sleep(1000)
>   } catch {
> case t: InterruptedException =>
>   throw new RuntimeException(t)
>   }
> }
> {code}
> Based on the code in TaskSetManager, I think this also affects kills of 
> speculative tasks. However, since the number of speculated tasks is few, and 
> usually you need to fail a task a few times before the stage is cancelled, 
> probably no-one noticed this in production.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception

2017-04-04 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20217:


Assignee: (was: Apache Spark)

> Executor should not fail stage if killed task throws non-interrupted exception
> --
>
> Key: SPARK-20217
> URL: https://issues.apache.org/jira/browse/SPARK-20217
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Eric Liang
>
> This is reproducible as follows. Run the following, and then use 
> SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will 
> fail since we threw a RuntimeException instead of InterruptedException.
> We should probably unconditionally return TaskKilled instead of TaskFailed if 
> the task was killed by the driver, regardless of the actual exception thrown.
> {code}
> spark.range(100).repartition(100).foreach { i =>
>   try {
> Thread.sleep(1000)
>   } catch {
> case t: InterruptedException =>
>   throw new RuntimeException(t)
>   }
> }
> {code}
> Based on the code in TaskSetManager, I think this also affects kills of 
> speculative tasks. However, since the number of speculated tasks is few, and 
> usually you need to fail a task a few times before the stage is cancelled, 
> probably no-one noticed this in production.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20003) FPGrowthModel setMinConfidence should affect rules generation and transform

2017-04-04 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-20003.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 17336
[https://github.com/apache/spark/pull/17336]

> FPGrowthModel setMinConfidence should affect rules generation and transform
> ---
>
> Key: SPARK-20003
> URL: https://issues.apache.org/jira/browse/SPARK-20003
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: yuhao yang
>Assignee: yuhao yang
>Priority: Minor
> Fix For: 2.2.0
>
>
> I was doing some test and find the issue. FPGrowthModel setMinConfidence 
> should affect rules generation and transform. 
> Currently associationRules in FPGrowthModel is a lazy val and 
> setMinConfidence in FPGrowthModel has no impact once associationRules got 
> computed .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20207) Add ablity to exclude current row in WindowSpec

2017-04-04 Thread Satyajit varma (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956134#comment-15956134
 ] 

Satyajit varma commented on SPARK-20207:


I can take up this ticket, will keep updating on the progress as this is my 
first attempt jumping into the Spark SQL codebase.

> Add ablity to exclude current row in WindowSpec
> ---
>
> Key: SPARK-20207
> URL: https://issues.apache.org/jira/browse/SPARK-20207
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mathew Wicks
>Priority: Minor
>
> It would be useful if we could implement a way to exclude the current row in 
> WindowSpec. (We can currently only select ranges of rows/time.)
> Currently, users have to resort to ridiculous measures to exclude the current 
> row from windowing aggregations. 
> As seen here:
> http://stackoverflow.com/questions/43180723/spark-sql-excluding-the-current-row-in-partition-by-windowing-functions/43198839#43198839



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen

2017-04-04 Thread Fei Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Wang updated SPARK-20184:
-
Description: 
The performance of following SQL get much worse in spark 2.x  in contrast with 
codegen off.

SELECT
 sum(COUNTER_57) 
,sum(COUNTER_71) 
,sum(COUNTER_3)  
,sum(COUNTER_70) 
,sum(COUNTER_66) 
,sum(COUNTER_75) 
,sum(COUNTER_69) 
,sum(COUNTER_55) 
,sum(COUNTER_63) 
,sum(COUNTER_68) 
,sum(COUNTER_56) 
,sum(COUNTER_37) 
,sum(COUNTER_51) 
,sum(COUNTER_42) 
,sum(COUNTER_43) 
,sum(COUNTER_1)  
,sum(COUNTER_76) 
,sum(COUNTER_54) 
,sum(COUNTER_44) 
,sum(COUNTER_46) 
,DIM_1 
,DIM_2 
,DIM_3
FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;

Num of rows of aggtable is about 3500.


whole stage codegen on(spark.sql.codegen.wholeStage = true):40s
whole stage codege  off(spark.sql.codegen.wholeStage = false):6s


After some analysis i think this is related to the huge java method(a java 
method of thousand lines) which generated by codegen.
And If i config -XX:-DontCompileHugeMethods the performance get much 
better(about 7s).

  was:
The performance of following SQL get much worse in spark 2.x  in contrast with 
codegen off.

SELECT
 sum(COUNTER_57) 
,sum(COUNTER_71) 
,sum(COUNTER_3)  
,sum(COUNTER_70) 
,sum(COUNTER_66) 
,sum(COUNTER_75) 
,sum(COUNTER_69) 
,sum(COUNTER_55) 
,sum(COUNTER_63) 
,sum(COUNTER_68) 
,sum(COUNTER_56) 
,sum(COUNTER_37) 
,sum(COUNTER_51) 
,sum(COUNTER_42) 
,sum(COUNTER_43) 
,sum(COUNTER_1)  
,sum(COUNTER_76) 
,sum(COUNTER_54) 
,sum(COUNTER_44) 
,sum(COUNTER_46) 
,DIM_1 
,DIM_2 
,DIM_3
FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;

Num of rows of aggtable is about 3500.


codegen on:40s
codegen off:6s


After some analysis i think this is related to the huge java method(a java 
method of thousand lines) which generated by codegen.
And If i config -XX:-DontCompileHugeMethods the performance get much 
better(about 7s).


> performance regression for complex/long sql when enable codegen
> ---
>
> Key: SPARK-20184
> URL: https://issues.apache.org/jira/browse/SPARK-20184
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0, 2.1.0
>Reporter: Fei Wang
>
> The performance of following SQL get much worse in spark 2.x  in contrast 
> with codegen off.
> SELECT
>sum(COUNTER_57) 
> ,sum(COUNTER_71) 
> ,sum(COUNTER_3)  
> ,sum(COUNTER_70) 
> ,sum(COUNTER_66) 
> ,sum(COUNTER_75) 
> ,sum(COUNTER_69) 
> ,sum(COUNTER_55) 
> ,sum(COUNTER_63) 
> ,sum(COUNTER_68) 
> ,sum(COUNTER_56) 
> ,sum(COUNTER_37) 
> ,sum(COUNTER_51) 
> ,sum(COUNTER_42) 
> ,sum(COUNTER_43) 
> ,sum(COUNTER_1)  
> ,sum(COUNTER_76) 
> ,sum(COUNTER_54) 
> ,sum(COUNTER_44) 
> ,sum(COUNTER_46) 
> ,DIM_1 
> ,DIM_2 
>   ,DIM_3
> FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;
> Num of rows of aggtable is about 3500.
> whole stage codegen on(spark.sql.codegen.wholeStage = true):40s
> whole stage codege  off(spark.sql.codegen.wholeStage = false):6s
> After some analysis i think this is related to the huge java method(a java 
> method of thousand lines) which generated by codegen.
> And If i config -XX:-DontCompileHugeMethods the performance get much 
> better(about 7s).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen

2017-04-04 Thread Fei Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Wang updated SPARK-20184:
-
Description: 
The performance of following SQL get much worse in spark 2.x  in contrast with 
codegen off.

SELECT
 sum(COUNTER_57) 
,sum(COUNTER_71) 
,sum(COUNTER_3)  
,sum(COUNTER_70) 
,sum(COUNTER_66) 
,sum(COUNTER_75) 
,sum(COUNTER_69) 
,sum(COUNTER_55) 
,sum(COUNTER_63) 
,sum(COUNTER_68) 
,sum(COUNTER_56) 
,sum(COUNTER_37) 
,sum(COUNTER_51) 
,sum(COUNTER_42) 
,sum(COUNTER_43) 
,sum(COUNTER_1)  
,sum(COUNTER_76) 
,sum(COUNTER_54) 
,sum(COUNTER_44) 
,sum(COUNTER_46) 
,DIM_1 
,DIM_2 
,DIM_3
FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;

Num of rows of aggtable is about 3500.


whole stage codegen on(spark.sql.codegen.wholeStage = true):40s
whole stage codegen  off(spark.sql.codegen.wholeStage = false):6s


After some analysis i think this is related to the huge java method(a java 
method of thousand lines) which generated by codegen.
And If i config -XX:-DontCompileHugeMethods the performance get much 
better(about 7s).

  was:
The performance of following SQL get much worse in spark 2.x  in contrast with 
codegen off.

SELECT
 sum(COUNTER_57) 
,sum(COUNTER_71) 
,sum(COUNTER_3)  
,sum(COUNTER_70) 
,sum(COUNTER_66) 
,sum(COUNTER_75) 
,sum(COUNTER_69) 
,sum(COUNTER_55) 
,sum(COUNTER_63) 
,sum(COUNTER_68) 
,sum(COUNTER_56) 
,sum(COUNTER_37) 
,sum(COUNTER_51) 
,sum(COUNTER_42) 
,sum(COUNTER_43) 
,sum(COUNTER_1)  
,sum(COUNTER_76) 
,sum(COUNTER_54) 
,sum(COUNTER_44) 
,sum(COUNTER_46) 
,DIM_1 
,DIM_2 
,DIM_3
FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;

Num of rows of aggtable is about 3500.


whole stage codegen on(spark.sql.codegen.wholeStage = true):40s
whole stage codege  off(spark.sql.codegen.wholeStage = false):6s


After some analysis i think this is related to the huge java method(a java 
method of thousand lines) which generated by codegen.
And If i config -XX:-DontCompileHugeMethods the performance get much 
better(about 7s).


> performance regression for complex/long sql when enable codegen
> ---
>
> Key: SPARK-20184
> URL: https://issues.apache.org/jira/browse/SPARK-20184
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0, 2.1.0
>Reporter: Fei Wang
>
> The performance of following SQL get much worse in spark 2.x  in contrast 
> with codegen off.
> SELECT
>sum(COUNTER_57) 
> ,sum(COUNTER_71) 
> ,sum(COUNTER_3)  
> ,sum(COUNTER_70) 
> ,sum(COUNTER_66) 
> ,sum(COUNTER_75) 
> ,sum(COUNTER_69) 
> ,sum(COUNTER_55) 
> ,sum(COUNTER_63) 
> ,sum(COUNTER_68) 
> ,sum(COUNTER_56) 
> ,sum(COUNTER_37) 
> ,sum(COUNTER_51) 
> ,sum(COUNTER_42) 
> ,sum(COUNTER_43) 
> ,sum(COUNTER_1)  
> ,sum(COUNTER_76) 
> ,sum(COUNTER_54) 
> ,sum(COUNTER_44) 
> ,sum(COUNTER_46) 
> ,DIM_1 
> ,DIM_2 
>   ,DIM_3
> FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;
> Num of rows of aggtable is about 3500.
> whole stage codegen on(spark.sql.codegen.wholeStage = true):40s
> whole stage codegen  off(spark.sql.codegen.wholeStage = false):6s
> After some analysis i think this is related to the huge java method(a java 
> method of thousand lines) which generated by codegen.
> And If i config -XX:-DontCompileHugeMethods the performance get much 
> better(about 7s).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable whole stage codegen

2017-04-04 Thread Fei Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Wang updated SPARK-20184:
-
Summary: performance regression for complex/long sql when enable whole 
stage codegen  (was: performance regression for complex/long sql when enable 
codegen)

> performance regression for complex/long sql when enable whole stage codegen
> ---
>
> Key: SPARK-20184
> URL: https://issues.apache.org/jira/browse/SPARK-20184
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0, 2.1.0
>Reporter: Fei Wang
>
> The performance of following SQL get much worse in spark 2.x  in contrast 
> with codegen off.
> SELECT
>sum(COUNTER_57) 
> ,sum(COUNTER_71) 
> ,sum(COUNTER_3)  
> ,sum(COUNTER_70) 
> ,sum(COUNTER_66) 
> ,sum(COUNTER_75) 
> ,sum(COUNTER_69) 
> ,sum(COUNTER_55) 
> ,sum(COUNTER_63) 
> ,sum(COUNTER_68) 
> ,sum(COUNTER_56) 
> ,sum(COUNTER_37) 
> ,sum(COUNTER_51) 
> ,sum(COUNTER_42) 
> ,sum(COUNTER_43) 
> ,sum(COUNTER_1)  
> ,sum(COUNTER_76) 
> ,sum(COUNTER_54) 
> ,sum(COUNTER_44) 
> ,sum(COUNTER_46) 
> ,DIM_1 
> ,DIM_2 
>   ,DIM_3
> FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;
> Num of rows of aggtable is about 3500.
> whole stage codegen on(spark.sql.codegen.wholeStage = true):40s
> whole stage codegen  off(spark.sql.codegen.wholeStage = false):6s
> After some analysis i think this is related to the huge java method(a java 
> method of thousand lines) which generated by codegen.
> And If i config -XX:-DontCompileHugeMethods the performance get much 
> better(about 7s).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18971) Netty issue may cause the shuffle client hang

2017-04-04 Thread Shixiong Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956209#comment-15956209
 ] 

Shixiong Zhu commented on SPARK-18971:
--

It's not backported because there are two many changes in the new Netty version 
and it's too risky to upgrade in a maintenance release.. We may backport it to 
2.1 if we don't see any regression after Spark 2.2.0 is released.

> Netty issue may cause the shuffle client hang
> -
>
> Key: SPARK-18971
> URL: https://issues.apache.org/jira/browse/SPARK-18971
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Minor
> Fix For: 2.2.0
>
>
> Check https://github.com/netty/netty/issues/6153 for details
> You should be able to see the following similar stack track in the executor 
> thread dump.
> {code}
> "shuffle-client-7-4" daemon prio=5 tid=97 RUNNABLE
> at io.netty.util.Recycler$Stack.scavengeSome(Recycler.java:504)
> at io.netty.util.Recycler$Stack.scavenge(Recycler.java:454)
> at io.netty.util.Recycler$Stack.pop(Recycler.java:435)
> at io.netty.util.Recycler.get(Recycler.java:144)
> at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.newInstance(PooledUnsafeDirectByteBuf.java:39)
> at 
> io.netty.buffer.PoolArena$DirectArena.newByteBuf(PoolArena.java:727)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:140)
> at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
> at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177)
> at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168)
> at 
> io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:129)
> at 
> io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:652)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
> at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20218) '/applications/[app-id]/stages' in REST API,add description.

2017-04-04 Thread guoxiaolongzte (JIRA)

guoxiaolongzte created SPARK-20218:
--

 Summary: '/applications/[app-id]/stages' in REST API,add 
description.
 Key: SPARK-20218
 URL: https://issues.apache.org/jira/browse/SPARK-20218
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 2.1.0
Reporter: guoxiaolongzte
Priority: Minor


'/applications/[app-id]/stages' in rest api.status should add description 
'?status=[active|complete|pending|failed] list only stages in the state.'

Now the lack of this description, resulting in the use of this api do not know 
the use of the status through the brush stage list.

code:
  @GET
  def stageList(@QueryParam("status") statuses: JList[StageStatus]): 
Seq[StageData] = {
val listener = ui.jobProgressListener
val stageAndStatus = AllStagesResource.stagesAndStatus(ui)
val adjStatuses = {
  if (statuses.isEmpty()) {
Arrays.asList(StageStatus.values(): _*)
  } else {
statuses
  }
};



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20135) spark thriftserver2: no job running but containers not release on yarn

2017-04-04 Thread bruce xu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956235#comment-15956235
 ] 

bruce xu commented on SPARK-20135:
--

OK, Thanks.

> spark thriftserver2: no job running but containers not release on yarn
> --
>
> Key: SPARK-20135
> URL: https://issues.apache.org/jira/browse/SPARK-20135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
> Environment: spark 2.0.1 with hadoop 2.6.0 
>Reporter: bruce xu
> Attachments: 0329-1.png, 0329-2.png, 0329-3.png
>
>
> i opened the executor dynamic allocation feature, however it doesn't work 
> sometimes.
> i set the initial executor num 50,  after job finished the cores and mem 
> resource did not release. 
> from the spark web UI, the active job/running task/stage num is 0 , but the 
> executors page show  cores 1276, active task 7288.
> from the yarn web UI,  the thriftserver job's running containers is 639 
> without releasing. 
> this may be a bug. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 131 matches

Mail list logo