[jira] [Resolved] (SPARK-20176) Spark Dataframe UDAF issue
[ https://issues.apache.org/jira/browse/SPARK-20176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20176. --- Resolution: Cannot Reproduce > Spark Dataframe UDAF issue > -- > > Key: SPARK-20176 > URL: https://issues.apache.org/jira/browse/SPARK-20176 > Project: Spark > Issue Type: IT Help > Components: Spark Core >Affects Versions: 2.0.2 >Reporter: Dinesh Man Amatya > > Getting following error in custom UDAF > Error while decoding: java.util.concurrent.ExecutionException: > java.lang.Exception: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 58, Column 33: Incompatible expression types "boolean" and "java.lang.Boolean" > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificSafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificSafeProjection extends > org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private MutableRow mutableRow; > /* 009 */ private Object[] values; > /* 010 */ private Object[] values1; > /* 011 */ private org.apache.spark.sql.types.StructType schema; > /* 012 */ private org.apache.spark.sql.types.StructType schema1; > /* 013 */ > /* 014 */ > /* 015 */ public SpecificSafeProjection(Object[] references) { > /* 016 */ this.references = references; > /* 017 */ mutableRow = (MutableRow) references[references.length - 1]; > /* 018 */ > /* 019 */ > /* 020 */ this.schema = (org.apache.spark.sql.types.StructType) > references[0]; > /* 021 */ this.schema1 = (org.apache.spark.sql.types.StructType) > references[1]; > /* 022 */ } > /* 023 */ > /* 024 */ public java.lang.Object apply(java.lang.Object _i) { > /* 025 */ InternalRow i = (InternalRow) _i; > /* 026 */ > /* 027 */ values = new Object[2]; > /* 028 */ > /* 029 */ boolean isNull2 = i.isNullAt(0); > /* 030 */ UTF8String value2 = isNull2 ? null : (i.getUTF8String(0)); > /* 031 */ > /* 032 */ boolean isNull1 = isNull2; > /* 033 */ final java.lang.String value1 = isNull1 ? null : > (java.lang.String) value2.toString(); > /* 034 */ isNull1 = value1 == null; > /* 035 */ if (isNull1) { > /* 036 */ values[0] = null; > /* 037 */ } else { > /* 038 */ values[0] = value1; > /* 039 */ } > /* 040 */ > /* 041 */ boolean isNull5 = i.isNullAt(1); > /* 042 */ InternalRow value5 = isNull5 ? null : (i.getStruct(1, 2)); > /* 043 */ boolean isNull3 = false; > /* 044 */ org.apache.spark.sql.Row value3 = null; > /* 045 */ if (!false && isNull5) { > /* 046 */ > /* 047 */ final org.apache.spark.sql.Row value6 = null; > /* 048 */ isNull3 = true; > /* 049 */ value3 = value6; > /* 050 */ } else { > /* 051 */ > /* 052 */ values1 = new Object[2]; > /* 053 */ > /* 054 */ boolean isNull10 = i.isNullAt(1); > /* 055 */ InternalRow value10 = isNull10 ? null : (i.getStruct(1, 2)); > /* 056 */ > /* 057 */ boolean isNull9 = isNull10 || false; > /* 058 */ final boolean value9 = isNull9 ? false : (Boolean) > value10.isNullAt(0); > /* 059 */ boolean isNull8 = false; > /* 060 */ double value8 = -1.0; > /* 061 */ if (!isNull9 && value9) { > /* 062 */ > /* 063 */ final double value12 = -1.0; > /* 064 */ isNull8 = true; > /* 065 */ value8 = value12; > /* 066 */ } else { > /* 067 */ > /* 068 */ boolean isNull14 = i.isNullAt(1); > /* 069 */ InternalRow value14 = isNull14 ? null : (i.getStruct(1, 2)); > /* 070 */ boolean isNull13 = isNull14; > /* 071 */ double value13 = -1.0; > /* 072 */ > /* 073 */ if (!isNull14) { > /* 074 */ > /* 075 */ if (value14.isNullAt(0)) { > /* 076 */ isNull13 = true; > /* 077 */ } else { > /* 078 */ value13 = value14.getDouble(0); > /* 079 */ } > /* 080 */ > /* 081 */ } > /* 082 */ isNull8 = isNull13; > /* 083 */ value8 = value13; > /* 084 */ } > /* 085 */ if (isNull8) { > /* 086 */ values1[0] = null; > /* 087 */ } else { > /* 088 */ values1[0] = value8; > /* 089 */ } > /* 090 */ > /* 091 */ boolean isNull17 = i.isNullAt(1); > /* 092 */ InternalRow value17 = isNull17 ? null : (i.getStruct(1, 2)); > /* 093 */ > /* 094 */ boolean isNull16 = isNull17 || false; > /* 095 */ final boolean value16 = isNull16 ? false : (Boolean) > value17.isNullAt(1); > /* 096 */ boolean isNull15 = false; > /* 097 */ double value15 = -1.0; > /* 098 */ if (!isNull16 && value16) { > /* 099 */ > /* 100 */ final double value19 = -1.0; > /* 101 */ isNull15 = true; > /* 102 */
[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener
[ https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954698#comment-15954698 ] Daan Van den Nest commented on SPARK-17742: --- Hi, is there already a fix for this issue? Because it effectively renders the SparkLauncher unusable.. Or does anybody know at least a workaround for it? Any thoughts [~anshbansal], [~vanzin] ? Thanks! > Spark Launcher does not get failed state in Listener > - > > Key: SPARK-17742 > URL: https://issues.apache.org/jira/browse/SPARK-17742 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I tried to launch an application using the below code. This is dummy code to > reproduce the problem. I tried exiting spark with status -1, throwing an > exception etc. but in no case did the listener give me failed status. But if > a spark job returns -1 or throws an exception from the main method it should > be considered as a failure. > {code} > package com.example; > import org.apache.spark.launcher.SparkAppHandle; > import org.apache.spark.launcher.SparkLauncher; > import java.io.IOException; > public class Main2 { > public static void main(String[] args) throws IOException, > InterruptedException { > SparkLauncher launcher = new SparkLauncher() > .setSparkHome("/opt/spark2") > > .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar") > .setMainClass("com.example.Main") > .setMaster("local[2]"); > launcher.startApplication(new MyListener()); > Thread.sleep(1000 * 60); > } > } > class MyListener implements SparkAppHandle.Listener { > @Override > public void stateChanged(SparkAppHandle handle) { > System.out.println("state changed " + handle.getState()); > } > @Override > public void infoChanged(SparkAppHandle handle) { > System.out.println("info changed " + handle.getState()); > } > } > {code} > The spark job is > {code} > package com.example; > import org.apache.spark.sql.SparkSession; > import java.io.IOException; > public class Main { > public static void main(String[] args) throws IOException { > SparkSession sparkSession = SparkSession > .builder() > .appName("" + System.currentTimeMillis()) > .getOrCreate(); > try { > for (int i = 0; i < 15; i++) { > Thread.sleep(1000); > System.out.println("sleeping 1"); > } > } catch (InterruptedException e) { > e.printStackTrace(); > } > //sparkSession.stop(); > System.exit(-1); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener
[ https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954714#comment-15954714 ] Aseem Bansal commented on SPARK-17742: -- [~daanvdn] We ended up using kafka messages to communicate to the web app that was using the launcher to launch the job whether the job was complete or failed. Dumped Launcher's states as they are broken. > Spark Launcher does not get failed state in Listener > - > > Key: SPARK-17742 > URL: https://issues.apache.org/jira/browse/SPARK-17742 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I tried to launch an application using the below code. This is dummy code to > reproduce the problem. I tried exiting spark with status -1, throwing an > exception etc. but in no case did the listener give me failed status. But if > a spark job returns -1 or throws an exception from the main method it should > be considered as a failure. > {code} > package com.example; > import org.apache.spark.launcher.SparkAppHandle; > import org.apache.spark.launcher.SparkLauncher; > import java.io.IOException; > public class Main2 { > public static void main(String[] args) throws IOException, > InterruptedException { > SparkLauncher launcher = new SparkLauncher() > .setSparkHome("/opt/spark2") > > .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar") > .setMainClass("com.example.Main") > .setMaster("local[2]"); > launcher.startApplication(new MyListener()); > Thread.sleep(1000 * 60); > } > } > class MyListener implements SparkAppHandle.Listener { > @Override > public void stateChanged(SparkAppHandle handle) { > System.out.println("state changed " + handle.getState()); > } > @Override > public void infoChanged(SparkAppHandle handle) { > System.out.println("info changed " + handle.getState()); > } > } > {code} > The spark job is > {code} > package com.example; > import org.apache.spark.sql.SparkSession; > import java.io.IOException; > public class Main { > public static void main(String[] args) throws IOException { > SparkSession sparkSession = SparkSession > .builder() > .appName("" + System.currentTimeMillis()) > .getOrCreate(); > try { > for (int i = 0; i < 15; i++) { > Thread.sleep(1000); > System.out.println("sleeping 1"); > } > } catch (InterruptedException e) { > e.printStackTrace(); > } > //sparkSession.stop(); > System.exit(-1); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20199) GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter
[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954796#comment-15954796 ] pralabhkumar commented on SPARK-20199: -- Hi GBM is internally using Random Forest GradientBoostedTrees have method boost which calls DescisionTreeRegressor Train method to build the trees. private[ml] def train(data: RDD[LabeledPoint], oldStrategy: OldStrategy): DecisionTreeRegressionModel = { val instr = Instrumentation.create(this, data) instr.logParams(params: _*) val trees = RandomForest.run(data, oldStrategy, numTrees = 1, featureSubsetStrategy = "all", seed = $(seed), instr = Some(instr), parentUID = Some(uid)) val m = trees.head.asInstanceOf[DecisionTreeRegressionModel] instr.logSuccess(m) m } Here the featureSubsetStrategy is hardcoded to "all" , is there any specific reason to do that . Shouldn't the property expose to user to chose the featureSubsetStrategy from "auto", "all" ,"sqrt" , "log2" , "onethird" . > GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter > --- > > Key: SPARK-20199 > URL: https://issues.apache.org/jira/browse/SPARK-20199 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Affects Versions: 2.1.0 >Reporter: pralabhkumar >Priority: Minor > > Spark GradientBoostedTreesModel doesn't have Column sampling rate parameter > . This parameter is available in H2O and XGBoost. > Sample from H2O.ai > gbmParams._col_sample_rate > Please provide the parameter . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20155) CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote
[ https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rick Moritz updated SPARK-20155: Summary: CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote (was: CSV-files with quoted quotes can't be parsed, if delimiter followes quoted quote) > CSV-files with quoted quotes can't be parsed, if delimiter follows quoted > quote > --- > > Key: SPARK-20155 > URL: https://issues.apache.org/jira/browse/SPARK-20155 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.0.0 >Reporter: Rick Moritz > > According to : > https://tools.ietf.org/html/rfc4180#section-2 > 7. If double-quotes are used to enclose fields, then a double-quote >appearing inside a field must be escaped by preceding it with >another double quote. For example: >"aaa","b""bb","ccc" > This currently works as is, but the following does not: > "aaa","b""b,b","ccc" > while "aaa","b\"b,b","ccc" does get parsed. > I assume, this happens because quotes are currently being parsed in pairs, > and that somehow ends up unquoting delimiter. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20156) Java String toLowerCase "Turkish locale bug" causes Spark problems
[ https://issues.apache.org/jira/browse/SPARK-20156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-20156: - Assignee: Sean Owen Summary: Java String toLowerCase "Turkish locale bug" causes Spark problems (was: Local dependent library used for upper and lowercase conversions.) I retitled this; please refer to things like http://mattryall.net/blog/2009/02/the-infamous-turkish-locale-bug for back-story on this particular issue. I believe the best change is to make all case-changing operations use Locale.ROOT. > Java String toLowerCase "Turkish locale bug" causes Spark problems > -- > > Key: SPARK-20156 > URL: https://issues.apache.org/jira/browse/SPARK-20156 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.1.0 > Environment: Ubunutu 16.04 > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121) >Reporter: Serkan Taş >Assignee: Sean Owen > Attachments: sprk_shell.txt > > > If the regional setting of the operation system is Turkish, the famous java > locale problem occurs (https://jira.atlassian.com/browse/CONF-5931 or > https://issues.apache.org/jira/browse/AVRO-1493). > e.g : > "SERDEINFO" lowers to "serdeınfo" > "uniquetable" uppers to "UNİQUETABLE" > work around : > add -Duser.country=US -Duser.language=en to the end of the line > SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true" > in spark-shell.sh -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20156) Java String toLowerCase "Turkish locale bug" causes Spark problems
[ https://issues.apache.org/jira/browse/SPARK-20156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954850#comment-15954850 ] Apache Spark commented on SPARK-20156: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/17527 > Java String toLowerCase "Turkish locale bug" causes Spark problems > -- > > Key: SPARK-20156 > URL: https://issues.apache.org/jira/browse/SPARK-20156 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.1.0 > Environment: Ubunutu 16.04 > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121) >Reporter: Serkan Taş >Assignee: Sean Owen > Attachments: sprk_shell.txt > > > If the regional setting of the operation system is Turkish, the famous java > locale problem occurs (https://jira.atlassian.com/browse/CONF-5931 or > https://issues.apache.org/jira/browse/AVRO-1493). > e.g : > "SERDEINFO" lowers to "serdeınfo" > "uniquetable" uppers to "UNİQUETABLE" > work around : > add -Duser.country=US -Duser.language=en to the end of the line > SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true" > in spark-shell.sh -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20156) Java String toLowerCase "Turkish locale bug" causes Spark problems
[ https://issues.apache.org/jira/browse/SPARK-20156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20156: Assignee: Apache Spark (was: Sean Owen) > Java String toLowerCase "Turkish locale bug" causes Spark problems > -- > > Key: SPARK-20156 > URL: https://issues.apache.org/jira/browse/SPARK-20156 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.1.0 > Environment: Ubunutu 16.04 > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121) >Reporter: Serkan Taş >Assignee: Apache Spark > Attachments: sprk_shell.txt > > > If the regional setting of the operation system is Turkish, the famous java > locale problem occurs (https://jira.atlassian.com/browse/CONF-5931 or > https://issues.apache.org/jira/browse/AVRO-1493). > e.g : > "SERDEINFO" lowers to "serdeınfo" > "uniquetable" uppers to "UNİQUETABLE" > work around : > add -Duser.country=US -Duser.language=en to the end of the line > SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true" > in spark-shell.sh -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20156) Java String toLowerCase "Turkish locale bug" causes Spark problems
[ https://issues.apache.org/jira/browse/SPARK-20156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20156: Assignee: Sean Owen (was: Apache Spark) > Java String toLowerCase "Turkish locale bug" causes Spark problems > -- > > Key: SPARK-20156 > URL: https://issues.apache.org/jira/browse/SPARK-20156 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.1.0 > Environment: Ubunutu 16.04 > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121) >Reporter: Serkan Taş >Assignee: Sean Owen > Attachments: sprk_shell.txt > > > If the regional setting of the operation system is Turkish, the famous java > locale problem occurs (https://jira.atlassian.com/browse/CONF-5931 or > https://issues.apache.org/jira/browse/AVRO-1493). > e.g : > "SERDEINFO" lowers to "serdeınfo" > "uniquetable" uppers to "UNİQUETABLE" > work around : > add -Duser.country=US -Duser.language=en to the end of the line > SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true" > in spark-shell.sh -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20190) '/applications/[app-id]/jobs' in rest api,status should be [running|succeeded|failed|unknown]
[ https://issues.apache.org/jira/browse/SPARK-20190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20190. --- Resolution: Fixed Fix Version/s: 2.2.0 2.1.1 Issue resolved by pull request 17507 [https://github.com/apache/spark/pull/17507] > '/applications/[app-id]/jobs' in rest api,status should be > [running|succeeded|failed|unknown] > - > > Key: SPARK-20190 > URL: https://issues.apache.org/jira/browse/SPARK-20190 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.1.0 >Reporter: guoxiaolongzte >Assignee: guoxiaolongzte >Priority: Trivial > Fix For: 2.1.1, 2.2.0 > > > '/applications/[app-id]/jobs' in rest api.status should > be'[running|succeeded|failed|unknown]'. > now status is '[complete|succeeded|failed]'. > but '/applications/[app-id]/jobs?status=complete' the server return 'HTTP > ERROR 404'. > Added '?status=running' and '?status=unknown'. > code : > public enum JobExecutionStatus { > RUNNING, > SUCCEEDED, > FAILED, > UNKNOWN; -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20193) Selecting empty struct causes ExpressionEncoder error.
[ https://issues.apache.org/jira/browse/SPARK-20193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954878#comment-15954878 ] Adrian Ionescu commented on SPARK-20193: Thanks for the workaround, but, sorry, this is not good enough. I agree that an empty struct is not very useful, but if it's not supported then the docs should say so and the error message should be clear. In my case, I'm building this struct dynamically, based on user input, so it may or may not be empty. Right now I have to special case it, but that introduces unnecessary complexity and makes the code less readable. > Selecting empty struct causes ExpressionEncoder error. > -- > > Key: SPARK-20193 > URL: https://issues.apache.org/jira/browse/SPARK-20193 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Adrian Ionescu > Labels: struct > > {{def struct(cols: Column*): Column}} > Given the above signature and the lack of any note in the docs saying that a > struct with no columns is not supported, I would expect the following to work: > {{spark.range(3).select(col("id"), struct().as("empty_struct")).collect}} > However, this results in: > {quote} > java.lang.AssertionError: assertion failed: each serializer expression should > contains at least one `BoundReference` > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:240) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:238) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.immutable.List.flatMap(List.scala:344) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.(ExpressionEncoder.scala:238) > at > org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:63) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2837) > at org.apache.spark.sql.Dataset.select(Dataset.scala:1131) > ... 39 elided > {quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20193) Selecting empty struct causes ExpressionEncoder error.
[ https://issues.apache.org/jira/browse/SPARK-20193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-20193: -- Labels: (was: struct) Priority: Minor (was: Major) Component/s: (was: Spark Core) SQL Documentation Issue Type: Improvement (was: Bug) This sounds like a doc issue as it's not clear that this is meaningful to support. > Selecting empty struct causes ExpressionEncoder error. > -- > > Key: SPARK-20193 > URL: https://issues.apache.org/jira/browse/SPARK-20193 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 2.1.0 >Reporter: Adrian Ionescu >Priority: Minor > > {{def struct(cols: Column*): Column}} > Given the above signature and the lack of any note in the docs saying that a > struct with no columns is not supported, I would expect the following to work: > {{spark.range(3).select(col("id"), struct().as("empty_struct")).collect}} > However, this results in: > {quote} > java.lang.AssertionError: assertion failed: each serializer expression should > contains at least one `BoundReference` > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:240) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:238) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.immutable.List.flatMap(List.scala:344) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.(ExpressionEncoder.scala:238) > at > org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:63) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2837) > at org.apache.spark.sql.Dataset.select(Dataset.scala:1131) > ... 39 elided > {quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener
[ https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954903#comment-15954903 ] Daan Van den Nest commented on SPARK-17742: --- Thanks for the advice [~anshbansal], I'll give that a try. A question to the spark developers: is a fix for this bug already on the roadmap for a next release of spark? > Spark Launcher does not get failed state in Listener > - > > Key: SPARK-17742 > URL: https://issues.apache.org/jira/browse/SPARK-17742 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I tried to launch an application using the below code. This is dummy code to > reproduce the problem. I tried exiting spark with status -1, throwing an > exception etc. but in no case did the listener give me failed status. But if > a spark job returns -1 or throws an exception from the main method it should > be considered as a failure. > {code} > package com.example; > import org.apache.spark.launcher.SparkAppHandle; > import org.apache.spark.launcher.SparkLauncher; > import java.io.IOException; > public class Main2 { > public static void main(String[] args) throws IOException, > InterruptedException { > SparkLauncher launcher = new SparkLauncher() > .setSparkHome("/opt/spark2") > > .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar") > .setMainClass("com.example.Main") > .setMaster("local[2]"); > launcher.startApplication(new MyListener()); > Thread.sleep(1000 * 60); > } > } > class MyListener implements SparkAppHandle.Listener { > @Override > public void stateChanged(SparkAppHandle handle) { > System.out.println("state changed " + handle.getState()); > } > @Override > public void infoChanged(SparkAppHandle handle) { > System.out.println("info changed " + handle.getState()); > } > } > {code} > The spark job is > {code} > package com.example; > import org.apache.spark.sql.SparkSession; > import java.io.IOException; > public class Main { > public static void main(String[] args) throws IOException { > SparkSession sparkSession = SparkSession > .builder() > .appName("" + System.currentTimeMillis()) > .getOrCreate(); > try { > for (int i = 0; i < 15; i++) { > Thread.sleep(1000); > System.out.println("sleeping 1"); > } > } catch (InterruptedException e) { > e.printStackTrace(); > } > //sparkSession.stop(); > System.exit(-1); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954904#comment-15954904 ] Nick Pentreath commented on SPARK-20203: I see there is a comment in the code that says: {{// TODO: support unbounded pattern length when maxPatternLength = 0}}. But the same thing can essentially be achieved setting the pattern length to {{Int.MaxValue}} as Sean has previously said, so I don't really think this is a valid work item (in fact probably that comment should be removed). Is an unbounded default really better (or worse) from an API / user facing perspective? There are arguments either way but to be honest I see nothing compelling enough to warrant a change here. > Change default maxPatternLength value to Int.MaxValue in PrefixSpan > --- > > Key: SPARK-20203 > URL: https://issues.apache.org/jira/browse/SPARK-20203 > Project: Spark > Issue Type: Wish > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Trivial > Original Estimate: 0h > Remaining Estimate: 0h > > I think changing the default value to Int.MaxValue would be more user > friendly. At least for new users. > Personally, when I run an algorithm, I expect it to find all solution by > default. And a limited number of them, when I set the parameters to do so. > The current implementation limit the length of solution patterns to 10. > Thus preventing all solution to be printed when running slightly large > datasets. > I feel like that should be changed, but since this would change the default > behavior of PrefixSpan. I think asking for the communities opinion should > come first. So, what do you think ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener
[ https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954905#comment-15954905 ] Sean Owen commented on SPARK-17742: --- [~daanvdn] did you implement Marcelo's suggestion? this is pretty WYSIWYG -- if you want to push a change, investigate it and test and then open a PR. If you don't see anything here, nobody's working on it. > Spark Launcher does not get failed state in Listener > - > > Key: SPARK-17742 > URL: https://issues.apache.org/jira/browse/SPARK-17742 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I tried to launch an application using the below code. This is dummy code to > reproduce the problem. I tried exiting spark with status -1, throwing an > exception etc. but in no case did the listener give me failed status. But if > a spark job returns -1 or throws an exception from the main method it should > be considered as a failure. > {code} > package com.example; > import org.apache.spark.launcher.SparkAppHandle; > import org.apache.spark.launcher.SparkLauncher; > import java.io.IOException; > public class Main2 { > public static void main(String[] args) throws IOException, > InterruptedException { > SparkLauncher launcher = new SparkLauncher() > .setSparkHome("/opt/spark2") > > .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar") > .setMainClass("com.example.Main") > .setMaster("local[2]"); > launcher.startApplication(new MyListener()); > Thread.sleep(1000 * 60); > } > } > class MyListener implements SparkAppHandle.Listener { > @Override > public void stateChanged(SparkAppHandle handle) { > System.out.println("state changed " + handle.getState()); > } > @Override > public void infoChanged(SparkAppHandle handle) { > System.out.println("info changed " + handle.getState()); > } > } > {code} > The spark job is > {code} > package com.example; > import org.apache.spark.sql.SparkSession; > import java.io.IOException; > public class Main { > public static void main(String[] args) throws IOException { > SparkSession sparkSession = SparkSession > .builder() > .appName("" + System.currentTimeMillis()) > .getOrCreate(); > try { > for (int i = 0; i < 15; i++) { > Thread.sleep(1000); > System.out.println("sleeping 1"); > } > } catch (InterruptedException e) { > e.printStackTrace(); > } > //sparkSession.stop(); > System.exit(-1); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20203. --- Resolution: Won't Fix > Change default maxPatternLength value to Int.MaxValue in PrefixSpan > --- > > Key: SPARK-20203 > URL: https://issues.apache.org/jira/browse/SPARK-20203 > Project: Spark > Issue Type: Wish > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Trivial > Original Estimate: 0h > Remaining Estimate: 0h > > I think changing the default value to Int.MaxValue would be more user > friendly. At least for new users. > Personally, when I run an algorithm, I expect it to find all solution by > default. And a limited number of them, when I set the parameters to do so. > The current implementation limit the length of solution patterns to 10. > Thus preventing all solution to be printed when running slightly large > datasets. > I feel like that should be changed, but since this would change the default > behavior of PrefixSpan. I think asking for the communities opinion should > come first. So, what do you think ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20193) Selecting empty struct causes ExpressionEncoder error.
[ https://issues.apache.org/jira/browse/SPARK-20193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954911#comment-15954911 ] Adrian Ionescu commented on SPARK-20193: In that case better change the signature of the function: {{def struct(col: Column, cols: Column*): Column}} > Selecting empty struct causes ExpressionEncoder error. > -- > > Key: SPARK-20193 > URL: https://issues.apache.org/jira/browse/SPARK-20193 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 2.1.0 >Reporter: Adrian Ionescu >Priority: Minor > > {{def struct(cols: Column*): Column}} > Given the above signature and the lack of any note in the docs saying that a > struct with no columns is not supported, I would expect the following to work: > {{spark.range(3).select(col("id"), struct().as("empty_struct")).collect}} > However, this results in: > {quote} > java.lang.AssertionError: assertion failed: each serializer expression should > contains at least one `BoundReference` > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:240) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:238) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.immutable.List.flatMap(List.scala:344) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.(ExpressionEncoder.scala:238) > at > org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:63) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2837) > at org.apache.spark.sql.Dataset.select(Dataset.scala:1131) > ... 39 elided > {quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20206) spark.ui.killEnabled=false property doesn't reflect on task/stages
[ https://issues.apache.org/jira/browse/SPARK-20206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954932#comment-15954932 ] Sean Owen commented on SPARK-20206: --- Can you first confirm in the Environments tab that the property is being set where you think it is? Can you show a screen shot or further describe what page you're looking at? > spark.ui.killEnabled=false property doesn't reflect on task/stages > -- > > Key: SPARK-20206 > URL: https://issues.apache.org/jira/browse/SPARK-20206 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.1.0 >Reporter: srinivasan >Priority: Minor > > spark.ui.killEnabled=false property doesn't reflect on active task and > stages.kill hyperlink is still enabled on active tasks and stages -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20210) Tests aborted in Spark SQL on ppc64le
Sonia Garudi created SPARK-20210: Summary: Tests aborted in Spark SQL on ppc64le Key: SPARK-20210 URL: https://issues.apache.org/jira/browse/SPARK-20210 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Environment: Ubuntu 14.04 ppc64le $ java -version openjdk version "1.8.0_111" OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14) OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) Reporter: Sonia Garudi The tests get aborted with the following error : {code} [31m*** RUN ABORTED ***[0m [31m org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout[0m [31m at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)[0m [31m at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)[0m [31m at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)[0m [31m at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)[0m [31m at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)[0m [31m at org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m [31m at org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m [31m at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m [31m at org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m [31m at org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m [31m ...[0m [31m Cause: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds][0m [31m at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)[0m [31m at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)[0m [31m at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)[0m [31m at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)[0m [31m at org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m [31m at org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m [31m at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m [31m at org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m [31m at org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m [31m at scala.collection.Iterator$class.foreach(Iterator.scala:893)[0m [31m ...[0m {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20210) Scala tests aborted in Spark SQL on ppc64le
[ https://issues.apache.org/jira/browse/SPARK-20210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sonia Garudi updated SPARK-20210: - Summary: Scala tests aborted in Spark SQL on ppc64le (was: Tests aborted in Spark SQL on ppc64le) > Scala tests aborted in Spark SQL on ppc64le > --- > > Key: SPARK-20210 > URL: https://issues.apache.org/jira/browse/SPARK-20210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 > Environment: Ubuntu 14.04 ppc64le > $ java -version > openjdk version "1.8.0_111" > OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14) > OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) >Reporter: Sonia Garudi > Labels: ppc64le > > The tests get aborted with the following error : > {code} > [31m*** RUN ABORTED ***[0m > [31m org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 > seconds]. This timeout is controlled by spark.rpc.askTimeout[0m > [31m at > org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)[0m > [31m at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)[0m > [31m at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)[0m > [31m at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)[0m > [31m at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)[0m > [31m at > org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m > [31m at > org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m > [31m at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m > [31m at > org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m > [31m at > org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m > [31m ...[0m > [31m Cause: java.util.concurrent.TimeoutException: Futures timed out after > [120 seconds][0m > [31m at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)[0m > [31m at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)[0m > [31m at > org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)[0m > [31m at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)[0m > [31m at > org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m > [31m at > org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m > [31m at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m > [31m at > org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m > [31m at > org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m > [31m at scala.collection.Iterator$class.foreach(Iterator.scala:893)[0m > [31m ...[0m > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20210) Scala tests aborted in Spark SQL on ppc64le
[ https://issues.apache.org/jira/browse/SPARK-20210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-20210: -- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) Are you sure it's due to PPC? or just a flaky test? BTW I'm not sure we would consider PPC supported, so I won't mark this is as a bug. However if there are simple ways to make it just work, that's fine. > Scala tests aborted in Spark SQL on ppc64le > --- > > Key: SPARK-20210 > URL: https://issues.apache.org/jira/browse/SPARK-20210 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 > Environment: Ubuntu 14.04 ppc64le > $ java -version > openjdk version "1.8.0_111" > OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14) > OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) >Reporter: Sonia Garudi >Priority: Minor > Labels: ppc64le > > The tests get aborted with the following error : > {code} > [31m*** RUN ABORTED ***[0m > [31m org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 > seconds]. This timeout is controlled by spark.rpc.askTimeout[0m > [31m at > org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)[0m > [31m at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)[0m > [31m at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)[0m > [31m at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)[0m > [31m at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)[0m > [31m at > org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m > [31m at > org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m > [31m at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m > [31m at > org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m > [31m at > org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m > [31m ...[0m > [31m Cause: java.util.concurrent.TimeoutException: Futures timed out after > [120 seconds][0m > [31m at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)[0m > [31m at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)[0m > [31m at > org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)[0m > [31m at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)[0m > [31m at > org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m > [31m at > org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m > [31m at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m > [31m at > org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m > [31m at > org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m > [31m at scala.collection.Iterator$class.foreach(Iterator.scala:893)[0m > [31m ...[0m > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20206) spark.ui.killEnabled=false property doesn't reflect on task/stages
[ https://issues.apache.org/jira/browse/SPARK-20206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954956#comment-15954956 ] srinivasan commented on SPARK-20206: Hi , I would like to close this issue .I have wrongly configured .It seems that the property is not set where i submit spark job.sorry > spark.ui.killEnabled=false property doesn't reflect on task/stages > -- > > Key: SPARK-20206 > URL: https://issues.apache.org/jira/browse/SPARK-20206 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.1.0 >Reporter: srinivasan >Priority: Minor > > spark.ui.killEnabled=false property doesn't reflect on active task and > stages.kill hyperlink is still enabled on active tasks and stages -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-20206) spark.ui.killEnabled=false property doesn't reflect on task/stages
[ https://issues.apache.org/jira/browse/SPARK-20206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] srinivasan closed SPARK-20206. -- Resolution: Not A Bug > spark.ui.killEnabled=false property doesn't reflect on task/stages > -- > > Key: SPARK-20206 > URL: https://issues.apache.org/jira/browse/SPARK-20206 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.1.0 >Reporter: srinivasan >Priority: Minor > > spark.ui.killEnabled=false property doesn't reflect on active task and > stages.kill hyperlink is still enabled on active tasks and stages -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener
[ https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954988#comment-15954988 ] Daan Van den Nest commented on SPARK-17742: --- Hi [~srowen], unfortunately I'm a java dev, so changing stuff in a scala code base is quite a bit out of my comfort zone. In case that changes, I will definitely look into it :-), Cheers > Spark Launcher does not get failed state in Listener > - > > Key: SPARK-17742 > URL: https://issues.apache.org/jira/browse/SPARK-17742 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I tried to launch an application using the below code. This is dummy code to > reproduce the problem. I tried exiting spark with status -1, throwing an > exception etc. but in no case did the listener give me failed status. But if > a spark job returns -1 or throws an exception from the main method it should > be considered as a failure. > {code} > package com.example; > import org.apache.spark.launcher.SparkAppHandle; > import org.apache.spark.launcher.SparkLauncher; > import java.io.IOException; > public class Main2 { > public static void main(String[] args) throws IOException, > InterruptedException { > SparkLauncher launcher = new SparkLauncher() > .setSparkHome("/opt/spark2") > > .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar") > .setMainClass("com.example.Main") > .setMaster("local[2]"); > launcher.startApplication(new MyListener()); > Thread.sleep(1000 * 60); > } > } > class MyListener implements SparkAppHandle.Listener { > @Override > public void stateChanged(SparkAppHandle handle) { > System.out.println("state changed " + handle.getState()); > } > @Override > public void infoChanged(SparkAppHandle handle) { > System.out.println("info changed " + handle.getState()); > } > } > {code} > The spark job is > {code} > package com.example; > import org.apache.spark.sql.SparkSession; > import java.io.IOException; > public class Main { > public static void main(String[] args) throws IOException { > SparkSession sparkSession = SparkSession > .builder() > .appName("" + System.currentTimeMillis()) > .getOrCreate(); > try { > for (int i = 0; i < 15; i++) { > Thread.sleep(1000); > System.out.println("sleeping 1"); > } > } catch (InterruptedException e) { > e.printStackTrace(); > } > //sparkSession.stop(); > System.exit(-1); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20198) Remove the inconsistency in table/function name conventions in SparkSession.Catalog APIs
[ https://issues.apache.org/jira/browse/SPARK-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-20198. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17518 [https://github.com/apache/spark/pull/17518] > Remove the inconsistency in table/function name conventions in > SparkSession.Catalog APIs > > > Key: SPARK-20198 > URL: https://issues.apache.org/jira/browse/SPARK-20198 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.1, 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.2.0 > > > Observed by @felixcheung , in `SparkSession`.`Catalog` APIs, we have > different conventions/rules for table/function identifiers/names. Most APIs > accept the qualified name (i.e., `databaseName`.`tableName` or > `databaseName`.`functionName`). However, the following five APIs do not > accept it. > - def listColumns(tableName: String): Dataset[Column] > - def getTable(tableName: String): Table > - def getFunction(functionName: String): Function > - def tableExists(tableName: String): Boolean > - def functionExists(functionName: String): Boolean > It is desirable to make them more consistent with the other Catalog APIs, > updates the function/API comments and adds the `@params` to clarify the > inputs we allow. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20185) csv decompressed incorrectly with extention other than 'gz'
[ https://issues.apache.org/jira/browse/SPARK-20185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20185. --- Resolution: Not A Problem > csv decompressed incorrectly with extention other than 'gz' > --- > > Key: SPARK-20185 > URL: https://issues.apache.org/jira/browse/SPARK-20185 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 1.6.3, 2.0.0, 2.0.1, 2.0.2, 2.1.0 >Reporter: Ran Mingxuan >Priority: Minor > Original Estimate: 168h > Remaining Estimate: 168h > > With code below: > val start_time = System.currentTimeMillis() > val gzFile = spark.read > .format("com.databricks.spark.csv") > .option("header", "false") > .option("inferSchema", "false") > .option("codec", "gzip") > .load("/foo/someCsvFile.gz.bak") > gzFile.repartition(1).write.mode("overwrite").parquet("/foo/") > got error even if I indicated the codec: > WARN util.NativeCodeLoader: Unable to load native-hadoop library for your > platform... using builtin-java classes where applicable > 17/03/23 15:44:55 WARN ipc.Client: Exception encountered while connecting to > the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category READ is not supported in state standby. Visit > https://s.apache.org/sbnn-error > 17/03/23 15:44:58 ERROR executor.Executor: Exception in task 2.0 in stage > 12.0 (TID 977) > java.lang.NullPointerException > at > org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:109) > at > org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:94) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:167) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:166) > Have to add extension to GzipCodec to make my code run. > import org.apache.hadoop.io.compress.GzipCodec > class BakGzipCodec extends GzipCodec { > override def getDefaultExtension(): String = ".gz.bak" > } > I suppose the file loader should get file codec depending on option first, > and then to extension. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20144) spark.read.parquet no long maintains ordering of the data
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20144. --- Resolution: Not A Problem > spark.read.parquet no long maintains ordering of the data > - > > Key: SPARK-20144 > URL: https://issues.apache.org/jira/browse/SPARK-20144 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Li Jin > > Hi, We are trying to upgrade Spark from 1.6.3 to 2.0.2. One issue we found is > when we read parquet files in 2.0.2, the ordering of rows in the resulting > dataframe is not the same as the ordering of rows in the dataframe that the > parquet file was reproduced with. > This is because FileSourceStrategy.scala combines the parquet files into > fewer partitions and also reordered them. This breaks our workflows because > they assume the ordering of the data. > Is this considered a bug? Also FileSourceStrategy and FileSourceScanExec > changed quite a bit from 2.0.2 to 2.1, so not sure if this is an issue with > 2.1. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Rangwani reopened SPARK-14492: It was closed with the comment "should work" Needs looking into > Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not > backwards compatible with earlier version > --- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Rangwani updated SPARK-14492: --- Comment: was deleted (was: It was closed with the comment "should work" Needs looking into) > Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not > backwards compatible with earlier version > --- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-14492. --- Resolution: Not A Problem [~sunil.rangwani] please don't reopen this unless you have new information. JIRA is not a place to ask people to do investigation, if that's what you're requesting, but for you to propose changes or report your own detailed investigation. > Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not > backwards compatible with earlier version > --- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen closed SPARK-14492. - > Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not > backwards compatible with earlier version > --- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20210) Scala tests aborted in Spark SQL on ppc64le
[ https://issues.apache.org/jira/browse/SPARK-20210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955032#comment-15955032 ] Sonia Garudi commented on SPARK-20210: -- [~srowen] , its not a flaky test. Although I am not sure whether its due to the PPC arch. I have tested it on both ppc64le and x86 arch. The tests ran and passed on the x86 platform. > Scala tests aborted in Spark SQL on ppc64le > --- > > Key: SPARK-20210 > URL: https://issues.apache.org/jira/browse/SPARK-20210 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 > Environment: Ubuntu 14.04 ppc64le > $ java -version > openjdk version "1.8.0_111" > OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14) > OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) >Reporter: Sonia Garudi >Priority: Minor > Labels: ppc64le > > The tests get aborted with the following error : > {code} > [31m*** RUN ABORTED ***[0m > [31m org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 > seconds]. This timeout is controlled by spark.rpc.askTimeout[0m > [31m at > org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)[0m > [31m at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)[0m > [31m at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)[0m > [31m at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)[0m > [31m at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)[0m > [31m at > org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m > [31m at > org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m > [31m at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m > [31m at > org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m > [31m at > org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m > [31m ...[0m > [31m Cause: java.util.concurrent.TimeoutException: Futures timed out after > [120 seconds][0m > [31m at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)[0m > [31m at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)[0m > [31m at > org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)[0m > [31m at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)[0m > [31m at > org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:125)[0m > [31m at > org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1792)[0m > [31m at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)[0m > [31m at > org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m > [31m at > org.apache.spark.sql.execution.CacheManager$$anonfun$clearCache$1$$anonfun$apply$mcV$sp$1.apply(CacheManager.scala:75)[0m > [31m at scala.collection.Iterator$class.foreach(Iterator.scala:893)[0m > [31m ...[0m > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20202) Remove references to org.spark-project.hive
[ https://issues.apache.org/jira/browse/SPARK-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955188#comment-15955188 ] Sean Owen commented on SPARK-20202: --- [~marmbrus] as release manager of the moment, I suggest we actually formally vote on release the org.spark-project.hive artifact, as I'm not clear we ever did formally. That much seems like a must-have. I don't know that it requires re-releasing the artifacts, but at least having a meaningful review of what it is, and agreeing (or not) that it's what the PMC wants to release, would I believe resolve doubts about the legitimacy of that artifact. There's still more to do no doubt, to get rid of the fork. This might include seeing if Hive 1.2.x can provide an un-uberized artifact. > Remove references to org.spark-project.hive > --- > > Key: SPARK-20202 > URL: https://issues.apache.org/jira/browse/SPARK-20202 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Affects Versions: 1.6.4, 2.0.3, 2.1.1 >Reporter: Owen O'Malley >Priority: Blocker > > Spark can't continue to depend on their fork of Hive and must move to > standard Hive versions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12606) Scala/Java compatibility issue Re: how to extend java transformer from Scala UnaryTransformer ?
[ https://issues.apache.org/jira/browse/SPARK-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955191#comment-15955191 ] Guillaume Dardelet commented on SPARK-12606: I had the same issue in Scala and I solved it by overloading the constructor so that it initialises the UID. The error comes from the initialisation of the parameter "inputCol". You get "null__inputCol" because when the parameter was initialised, your class didn't have a uid. Therefore, instead of class Lemmatizer extends UnaryTransformer[String, String, Lemmatizer] { override val uid: String = Identifiable.randomUID("lemmatizer") protected def createTransformFunc: String) => String = ??? protected def outputDataType: DataType = StringType } Do this: class Lemmatizer(override val uid: String) extends UnaryTransformer[String, String, Lemmatizer] { def this() = this( Identifiable.randomUID("lemmatizer") ) protected def createTransformFunc: String) => String = ??? protected def outputDataType: DataType = StringType } > Scala/Java compatibility issue Re: how to extend java transformer from Scala > UnaryTransformer ? > --- > > Key: SPARK-12606 > URL: https://issues.apache.org/jira/browse/SPARK-12606 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.5.2 > Environment: Java 8, Mac OS, Spark-1.5.2 >Reporter: Andrew Davidson > Labels: transformers > > Hi Andy, > I suspect that you hit the Scala/Java compatibility issue, I can also > reproduce this issue, so could you file a JIRA to track this issue? > Yanbo > 2016-01-02 3:38 GMT+08:00 Andy Davidson : > I am trying to write a trivial transformer I use use in my pipeline. I am > using java and spark 1.5.2. It was suggested that I use the Tokenize.scala > class as an example. This should be very easy how ever I do not understand > Scala, I am having trouble debugging the following exception. > Any help would be greatly appreciated. > Happy New Year > Andy > java.lang.IllegalArgumentException: requirement failed: Param null__inputCol > does not belong to Stemmer_2f3aa96d-7919-4eaa-ad54-f7c620b92d1c. > at scala.Predef$.require(Predef.scala:233) > at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:557) > at org.apache.spark.ml.param.Params$class.set(params.scala:436) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37) > at org.apache.spark.ml.param.Params$class.set(params.scala:422) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37) > at > org.apache.spark.ml.UnaryTransformer.setInputCol(Transformer.scala:83) > at com.pws.xxx.ml.StemmerTest.test(StemmerTest.java:30) > public class StemmerTest extends AbstractSparkTest { > @Test > public void test() { > Stemmer stemmer = new Stemmer() > .setInputCol("raw”) //line 30 > .setOutputCol("filtered"); > } > } > /** > * @ see > spark-1.5.1/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala > * @ see > https://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/ > * @ see > http://www.tonytruong.net/movie-rating-prediction-with-apache-spark-and-hortonworks/ > * > * @author andrewdavidson > * > */ > public class Stemmer extends UnaryTransformer, List, > Stemmer> implements Serializable{ > static Logger logger = LoggerFactory.getLogger(Stemmer.class); > private static final long serialVersionUID = 1L; > private static final ArrayType inputType = > DataTypes.createArrayType(DataTypes.StringType, true); > private final String uid = Stemmer.class.getSimpleName() + "_" + > UUID.randomUUID().toString(); > @Override > public String uid() { > return uid; > } > /* >override protected def validateInputType(inputType: DataType): Unit = { > require(inputType == StringType, s"Input type must be string type but got > $inputType.") > } > */ > @Override > public void validateInputType(DataType inputTypeArg) { > String msg = "inputType must be " + inputType.simpleString() + " but > got " + inputTypeArg.simpleString(); > assert (inputType.equals(inputTypeArg)) : msg; > } > > @Override > public Function1, List> createTransformFunc() { > // > http://stackoverflow.com/questions/6545066/using-scala-from-java-passing-functions-as-parameters > Function1, List> f = new > AbstractFunction1, List>() { > public List apply(List words) { > for(String word : words) { > logger.error("AEDWIP input word: {}", word); > } > return wor
[jira] [Comment Edited] (SPARK-12606) Scala/Java compatibility issue Re: how to extend java transformer from Scala UnaryTransformer ?
[ https://issues.apache.org/jira/browse/SPARK-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955191#comment-15955191 ] Guillaume Dardelet edited comment on SPARK-12606 at 4/4/17 2:27 PM: I had the same issue in Scala and I solved it by overloading the constructor so that it initialises the UID. The error comes from the initialisation of the parameter "inputCol". You get "null__inputCol" because when the parameter was initialised, your class didn't have a uid. Therefore, instead of {code} class Lemmatizer extends UnaryTransformer[String, String, Lemmatizer] { override val uid: String = Identifiable.randomUID("lemmatizer") protected def createTransformFunc: String) => String = ??? protected def outputDataType: DataType = StringType } {code} Do this: {code} class Lemmatizer(override val uid: String) extends UnaryTransformer[String, String, Lemmatizer] { def this() = this( Identifiable.randomUID("lemmatizer") ) protected def createTransformFunc: String) => String = ??? protected def outputDataType: DataType = StringType } {code} was (Author: panoramix): I had the same issue in Scala and I solved it by overloading the constructor so that it initialises the UID. The error comes from the initialisation of the parameter "inputCol". You get "null__inputCol" because when the parameter was initialised, your class didn't have a uid. Therefore, instead of {code:scala} class Lemmatizer extends UnaryTransformer[String, String, Lemmatizer] { override val uid: String = Identifiable.randomUID("lemmatizer") protected def createTransformFunc: String) => String = ??? protected def outputDataType: DataType = StringType } {code} Do this: class Lemmatizer(override val uid: String) extends UnaryTransformer[String, String, Lemmatizer] { def this() = this( Identifiable.randomUID("lemmatizer") ) protected def createTransformFunc: String) => String = ??? protected def outputDataType: DataType = StringType } > Scala/Java compatibility issue Re: how to extend java transformer from Scala > UnaryTransformer ? > --- > > Key: SPARK-12606 > URL: https://issues.apache.org/jira/browse/SPARK-12606 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.5.2 > Environment: Java 8, Mac OS, Spark-1.5.2 >Reporter: Andrew Davidson > Labels: transformers > > Hi Andy, > I suspect that you hit the Scala/Java compatibility issue, I can also > reproduce this issue, so could you file a JIRA to track this issue? > Yanbo > 2016-01-02 3:38 GMT+08:00 Andy Davidson : > I am trying to write a trivial transformer I use use in my pipeline. I am > using java and spark 1.5.2. It was suggested that I use the Tokenize.scala > class as an example. This should be very easy how ever I do not understand > Scala, I am having trouble debugging the following exception. > Any help would be greatly appreciated. > Happy New Year > Andy > java.lang.IllegalArgumentException: requirement failed: Param null__inputCol > does not belong to Stemmer_2f3aa96d-7919-4eaa-ad54-f7c620b92d1c. > at scala.Predef$.require(Predef.scala:233) > at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:557) > at org.apache.spark.ml.param.Params$class.set(params.scala:436) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37) > at org.apache.spark.ml.param.Params$class.set(params.scala:422) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37) > at > org.apache.spark.ml.UnaryTransformer.setInputCol(Transformer.scala:83) > at com.pws.xxx.ml.StemmerTest.test(StemmerTest.java:30) > public class StemmerTest extends AbstractSparkTest { > @Test > public void test() { > Stemmer stemmer = new Stemmer() > .setInputCol("raw”) //line 30 > .setOutputCol("filtered"); > } > } > /** > * @ see > spark-1.5.1/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala > * @ see > https://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/ > * @ see > http://www.tonytruong.net/movie-rating-prediction-with-apache-spark-and-hortonworks/ > * > * @author andrewdavidson > * > */ > public class Stemmer extends UnaryTransformer, List, > Stemmer> implements Serializable{ > static Logger logger = LoggerFactory.getLogger(Stemmer.class); > private static final long serialVersionUID = 1L; > private static final ArrayType inputType = > DataTypes.createArrayType(DataTypes.StringType, true); > private final String uid = Stemmer.class.getSimpleName() + "_" + > UUID.randomUUID().toString(); > @Override > public Strin
[jira] [Comment Edited] (SPARK-12606) Scala/Java compatibility issue Re: how to extend java transformer from Scala UnaryTransformer ?
[ https://issues.apache.org/jira/browse/SPARK-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955191#comment-15955191 ] Guillaume Dardelet edited comment on SPARK-12606 at 4/4/17 2:27 PM: I had the same issue in Scala and I solved it by overloading the constructor so that it initialises the UID. The error comes from the initialisation of the parameter "inputCol". You get "null__inputCol" because when the parameter was initialised, your class didn't have a uid. Therefore, instead of {code:scala} class Lemmatizer extends UnaryTransformer[String, String, Lemmatizer] { override val uid: String = Identifiable.randomUID("lemmatizer") protected def createTransformFunc: String) => String = ??? protected def outputDataType: DataType = StringType } {code} Do this: class Lemmatizer(override val uid: String) extends UnaryTransformer[String, String, Lemmatizer] { def this() = this( Identifiable.randomUID("lemmatizer") ) protected def createTransformFunc: String) => String = ??? protected def outputDataType: DataType = StringType } was (Author: panoramix): I had the same issue in Scala and I solved it by overloading the constructor so that it initialises the UID. The error comes from the initialisation of the parameter "inputCol". You get "null__inputCol" because when the parameter was initialised, your class didn't have a uid. Therefore, instead of class Lemmatizer extends UnaryTransformer[String, String, Lemmatizer] { override val uid: String = Identifiable.randomUID("lemmatizer") protected def createTransformFunc: String) => String = ??? protected def outputDataType: DataType = StringType } Do this: class Lemmatizer(override val uid: String) extends UnaryTransformer[String, String, Lemmatizer] { def this() = this( Identifiable.randomUID("lemmatizer") ) protected def createTransformFunc: String) => String = ??? protected def outputDataType: DataType = StringType } > Scala/Java compatibility issue Re: how to extend java transformer from Scala > UnaryTransformer ? > --- > > Key: SPARK-12606 > URL: https://issues.apache.org/jira/browse/SPARK-12606 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.5.2 > Environment: Java 8, Mac OS, Spark-1.5.2 >Reporter: Andrew Davidson > Labels: transformers > > Hi Andy, > I suspect that you hit the Scala/Java compatibility issue, I can also > reproduce this issue, so could you file a JIRA to track this issue? > Yanbo > 2016-01-02 3:38 GMT+08:00 Andy Davidson : > I am trying to write a trivial transformer I use use in my pipeline. I am > using java and spark 1.5.2. It was suggested that I use the Tokenize.scala > class as an example. This should be very easy how ever I do not understand > Scala, I am having trouble debugging the following exception. > Any help would be greatly appreciated. > Happy New Year > Andy > java.lang.IllegalArgumentException: requirement failed: Param null__inputCol > does not belong to Stemmer_2f3aa96d-7919-4eaa-ad54-f7c620b92d1c. > at scala.Predef$.require(Predef.scala:233) > at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:557) > at org.apache.spark.ml.param.Params$class.set(params.scala:436) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37) > at org.apache.spark.ml.param.Params$class.set(params.scala:422) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37) > at > org.apache.spark.ml.UnaryTransformer.setInputCol(Transformer.scala:83) > at com.pws.xxx.ml.StemmerTest.test(StemmerTest.java:30) > public class StemmerTest extends AbstractSparkTest { > @Test > public void test() { > Stemmer stemmer = new Stemmer() > .setInputCol("raw”) //line 30 > .setOutputCol("filtered"); > } > } > /** > * @ see > spark-1.5.1/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala > * @ see > https://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/ > * @ see > http://www.tonytruong.net/movie-rating-prediction-with-apache-spark-and-hortonworks/ > * > * @author andrewdavidson > * > */ > public class Stemmer extends UnaryTransformer, List, > Stemmer> implements Serializable{ > static Logger logger = LoggerFactory.getLogger(Stemmer.class); > private static final long serialVersionUID = 1L; > private static final ArrayType inputType = > DataTypes.createArrayType(DataTypes.StringType, true); > private final String uid = Stemmer.class.getSimpleName() + "_" + > UUID.randomUUID().toString(); > @Override > public String uid() { > return u
[jira] [Created] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception
StanZhai created SPARK-20211: Summary: `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception Key: SPARK-20211 URL: https://issues.apache.org/jira/browse/SPARK-20211 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0, 2.0.2, 2.0.1, 2.0.0, 2.1.1 Reporter: StanZhai Priority: Critical The following SQL: {code} select 1 > 0.0001 from tb {code} throws Decimal scale (0) cannot be greater than precision (-2) exception in Spark 2.x. `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and Spark 2.x. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception
[ https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-20211: - Priority: Minor (was: Critical) > `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) > exception > - > > Key: SPARK-20211 > URL: https://issues.apache.org/jira/browse/SPARK-20211 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1 >Reporter: StanZhai >Priority: Minor > Labels: correctness > > The following SQL: > {code} > select 1 > 0.0001 from tb > {code} > throws Decimal scale (0) cannot be greater than precision (-2) exception in > Spark 2.x. > `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and > Spark 2.x. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception
[ https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955239#comment-15955239 ] Hyukjin Kwon commented on SPARK-20211: -- Please refer http://spark.apache.org/contributing.html {quote} Critical: a large minority of users are missing important functionality without this, and/or a workaround is difficult {quote} workaround as below: {code} scala> sql("select double(1) > double(0.0001)").show() ++ |(CAST(1 AS DOUBLE) > CAST(0.0001 AS DOUBLE))| ++ |true| ++ {code} Also, it states, {quote} Priority. Set to Major or below; higher priorities are generally reserved for committers to set {quote} > `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) > exception > - > > Key: SPARK-20211 > URL: https://issues.apache.org/jira/browse/SPARK-20211 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1 >Reporter: StanZhai >Priority: Critical > Labels: correctness > > The following SQL: > {code} > select 1 > 0.0001 from tb > {code} > throws Decimal scale (0) cannot be greater than precision (-2) exception in > Spark 2.x. > `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and > Spark 2.x. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception
[ https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20211: Assignee: Apache Spark > `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) > exception > - > > Key: SPARK-20211 > URL: https://issues.apache.org/jira/browse/SPARK-20211 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1 >Reporter: StanZhai >Assignee: Apache Spark >Priority: Minor > Labels: correctness > > The following SQL: > {code} > select 1 > 0.0001 from tb > {code} > throws Decimal scale (0) cannot be greater than precision (-2) exception in > Spark 2.x. > `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and > Spark 2.x. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception
[ https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20211: Assignee: (was: Apache Spark) > `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) > exception > - > > Key: SPARK-20211 > URL: https://issues.apache.org/jira/browse/SPARK-20211 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1 >Reporter: StanZhai >Priority: Minor > Labels: correctness > > The following SQL: > {code} > select 1 > 0.0001 from tb > {code} > throws Decimal scale (0) cannot be greater than precision (-2) exception in > Spark 2.x. > `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and > Spark 2.x. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception
[ https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955243#comment-15955243 ] Apache Spark commented on SPARK-20211: -- User 'stanzhai' has created a pull request for this issue: https://github.com/apache/spark/pull/17529 > `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) > exception > - > > Key: SPARK-20211 > URL: https://issues.apache.org/jira/browse/SPARK-20211 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1 >Reporter: StanZhai >Priority: Minor > Labels: correctness > > The following SQL: > {code} > select 1 > 0.0001 from tb > {code} > throws Decimal scale (0) cannot be greater than precision (-2) exception in > Spark 2.x. > `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and > Spark 2.x. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception
[ https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] StanZhai updated SPARK-20211: - Priority: Major (was: Minor) > `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) > exception > - > > Key: SPARK-20211 > URL: https://issues.apache.org/jira/browse/SPARK-20211 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1 >Reporter: StanZhai > Labels: correctness > > The following SQL: > {code} > select 1 > 0.0001 from tb > {code} > throws Decimal scale (0) cannot be greater than precision (-2) exception in > Spark 2.x. > `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and > Spark 2.x. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955284#comment-15955284 ] Sunil Rangwani commented on SPARK-14492: [~srowen] Please don't close with a comment "should work" without any helpful contribution. Let me describe the issue again: 1. Spark was built with -PHive where it used Hive version 1.2.1 2. config properties for {{spark.sql.hive.metastore.jars}} and {{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as described in the documentation 3. Tried using spark sql by pointing to a remote external metastore service. Tried setting 1.2.1 in step #2 which did not work because external metastore was older version. Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which worked. The reason for this is because spark uses a Hive config property here https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500 that was introduced only in Hive 1.2.0 as part of https://issues.apache.org/jira/browse/HIVE-9508 > Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not > backwards compatible with earlier version > --- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Rangwani reopened SPARK-14492: > Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not > backwards compatible with earlier version > --- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20211) `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) exception
[ https://issues.apache.org/jira/browse/SPARK-20211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955285#comment-15955285 ] StanZhai commented on SPARK-20211: -- A workaround is difficult for me, because of all of my SQL are generated by a high-level system, I cannot cast all columns as double. FLOOR and CEIL are frequently used functions, and not all users will give a feedback to the community when encounter this problem. We should pay attention to the correctness of the SQL. > `1 > 0.0001` throws Decimal scale (0) cannot be greater than precision (-2) > exception > - > > Key: SPARK-20211 > URL: https://issues.apache.org/jira/browse/SPARK-20211 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1 >Reporter: StanZhai > Labels: correctness > > The following SQL: > {code} > select 1 > 0.0001 from tb > {code} > throws Decimal scale (0) cannot be greater than precision (-2) exception in > Spark 2.x. > `floor(0.0001)` and `ceil(0.0001)` have the same problem in Spark 1.6.x and > Spark 2.x. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955284#comment-15955284 ] Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 3:51 PM: [~srowen] Please don't close with a comment "should work" without any helpful contribution. Let me describe the issue again: 1. Spark was built with -PHive where it used Hive version 1.2.1 2. config properties for {{spark.sql.hive.metastore.jars}} and {{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as described in the documentation 3. Tried using spark sql by pointing to a remote external metastore service. Tried setting 1.2.1 in step #2 which did not work because external metastore was older version. Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which worked. The reason for this is because spark uses a Hive config property here https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500 that was introduced only in Hive 1.2.0 as part of https://issues.apache.org/jira/browse/HIVE-9508 was (Author: sunil.rangwani): [~srowen] Please don't close with a comment "should work" without any helpful contribution. Let me describe the issue again: 1. Spark was built with -PHive where it uses Hive version 1.2.1 2. config properties for {{spark.sql.hive.metastore.jars}} and {{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as described in the documentation 3. Tried using spark sql by pointing to a remote external metastore service. Tried setting 1.2.1 in step #2 which did not work because external metastore was older version. Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which worked. The reason for this is because spark uses a Hive config property here https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500 that was introduced only in Hive 1.2.0 as part of https://issues.apache.org/jira/browse/HIVE-9508 > Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not > backwards compatible with earlier version > --- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit
[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955284#comment-15955284 ] Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 3:51 PM: [~srowen] Please don't close with a comment "should work" without any helpful contribution. Let me describe the issue again: 1. Spark was built with -PHive where it uses Hive version 1.2.1 2. config properties for {{spark.sql.hive.metastore.jars}} and {{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as described in the documentation 3. Tried using spark sql by pointing to a remote external metastore service. Tried setting 1.2.1 in step #2 which did not work because external metastore was older version. Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which worked. The reason for this is because spark uses a Hive config property here https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500 that was introduced only in Hive 1.2.0 as part of https://issues.apache.org/jira/browse/HIVE-9508 was (Author: sunil.rangwani): [~srowen] Please don't close with a comment "should work" without any helpful contribution. Let me describe the issue again: 1. Spark was built with -PHive where it used Hive version 1.2.1 2. config properties for {{spark.sql.hive.metastore.jars}} and {{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as described in the documentation 3. Tried using spark sql by pointing to a remote external metastore service. Tried setting 1.2.1 in step #2 which did not work because external metastore was older version. Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which worked. The reason for this is because spark uses a Hive config property here https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500 that was introduced only in Hive 1.2.0 as part of https://issues.apache.org/jira/browse/HIVE-9508 > Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not > backwards compatible with earlier version > --- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit
[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955284#comment-15955284 ] Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 3:55 PM: [~srowen] Please don't close with a comment "should work" without any helpful contribution. Let me describe the issue again: 1. Spark was built with -PHive where it used Hive version 1.2.1 2. config properties for {{spark.sql.hive.metastore.jars}} and {{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as described in the documentation 3. Tried using spark sql by pointing to a remote external metastore service. Tried setting 1.2.1 in step #2 which did not work because external metastore was older version. Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which worked. The reason for this is because spark uses a Hive config property here https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500 that was introduced only in Hive 1.2.0 as part of https://issues.apache.org/jira/browse/HIVE-9508 was (Author: sunil.rangwani): [~srowen] Please don't close with a comment "should work" without any helpful contribution. Let me describe the issue again: 1. Spark was built with -PHive where it used Hive version 1.2.1 2. config properties for {{spark.sql.hive.metastore.jars}} and {{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as described in the documentation 3. Tried using spark sql by pointing to a remote external metastore service. Tried setting 1.2.1 in step #2 which did not work because external metastore was older version. Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which worked. The reason for this is because spark uses a Hive config property here https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500 that was introduced only in Hive 1.2.0 as part of https://issues.apache.org/jira/browse/HIVE-9508 > Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not > backwards compatible with earlier version > --- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit
[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955294#comment-15955294 ] Sean Owen commented on SPARK-14492: --- This isn't even consistent with the title of this JIRA. How would updating your external Hive metastore affect the code that Spark uses at runtime? > Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not > backwards compatible with earlier version > --- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955284#comment-15955284 ] Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 3:58 PM: [~srowen] Please don't close with a comment "should work" without any helpful contribution. Let me describe the issue again: 1. Spark was built with -PHive where it used Hive version 1.2.1 2. config properties for {{spark.sql.hive.metastore.jars}} and {{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as described in the documentation 3. Tried using spark sql by pointing to a remote external metastore service. Tried setting 1.2.1 in step #2 which did not work because external metastore was older version. Upgraded external metastore service and database to version 1.2.0 {color:red}*(not 1.2.1)*{color} which worked. The reason for this is because spark uses a Hive config property here https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500 that was introduced only in Hive 1.2.0 as part of https://issues.apache.org/jira/browse/HIVE-9508 was (Author: sunil.rangwani): [~srowen] Please don't close with a comment "should work" without any helpful contribution. Let me describe the issue again: 1. Spark was built with -PHive where it used Hive version 1.2.1 2. config properties for {{spark.sql.hive.metastore.jars}} and {{spark.sql.hive.metastore.version}} were set for Hive version 1.0.0 as described in the documentation 3. Tried using spark sql by pointing to a remote external metastore service. Tried setting 1.2.1 in step #2 which did not work because external metastore was older version. Upgraded external metastore to 1.2.0 {color:red}*(not 1.2.1)*{color} which worked. The reason for this is because spark uses a Hive config property here https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L500 that was introduced only in Hive 1.2.0 as part of https://issues.apache.org/jira/browse/HIVE-9508 > Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not > backwards compatible with earlier version > --- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.a
[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955315#comment-15955315 ] Sunil Rangwani commented on SPARK-14492: {quote} How would updating your external Hive metastore affect the code that Spark uses at runtime? {quote} It can't. It uses the jars specified in spark.sql.hive.metastore.jars property {quote}This isn't even consistent with the title of this JIRA.{quote} I will change the title to make it a better reflection of the actual problem. > Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not > backwards compatible with earlier version > --- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955317#comment-15955317 ] Sunil Rangwani commented on SPARK-14492: Appears to be a problem with version 2.1.0 too as reported by another user > Spark SQL 1.6.0 does not work with external Hive metastore version lower than > 1.2.0; its not backwards compatible with earlier version > -- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Rangwani updated SPARK-14492: --- Summary: Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version (was: Spark SQL 1.6.0 does not work with Hive version lower than 1.2.0; its not backwards compatible with earlier version) > Spark SQL 1.6.0 does not work with external Hive metastore version lower than > 1.2.0; its not backwards compatible with earlier version > -- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955315#comment-15955315 ] Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 4:11 PM: {quote} How would updating your external Hive metastore affect the code that Spark uses at runtime? {quote} It can't. It uses the jars specified in spark.sql.hive.metastore.jars property {quote}This isn't even consistent with the title of this JIRA.{quote} I have changed the title to make it a better reflection of the actual problem. was (Author: sunil.rangwani): {quote} How would updating your external Hive metastore affect the code that Spark uses at runtime? {quote} It can't. It uses the jars specified in spark.sql.hive.metastore.jars property {quote}This isn't even consistent with the title of this JIRA.{quote} I will change the title to make it a better reflection of the actual problem. > Spark SQL 1.6.0 does not work with external Hive metastore version lower than > 1.2.0; its not backwards compatible with earlier version > -- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20204) remove SimpleCatalystConf and CatalystConf type alias
[ https://issues.apache.org/jira/browse/SPARK-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-20204: Summary: remove SimpleCatalystConf and CatalystConf type alias (was: separate SQLConf into catalyst confs and sql confs) > remove SimpleCatalystConf and CatalystConf type alias > - > > Key: SPARK-20204 > URL: https://issues.apache.org/jira/browse/SPARK-20204 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20026) Document R GLM Tweedie family support in programming guide and code example
[ https://issues.apache.org/jira/browse/SPARK-20026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955342#comment-15955342 ] Wayne Zhang commented on SPARK-20026: - [~felixcheung] Yes, I will work on this. Thanks. > Document R GLM Tweedie family support in programming guide and code example > --- > > Key: SPARK-20026 > URL: https://issues.apache.org/jira/browse/SPARK-20026 > Project: Spark > Issue Type: Bug > Components: Documentation, SparkR >Affects Versions: 2.2.0 >Reporter: Felix Cheung > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955398#comment-15955398 ] Sean Owen commented on SPARK-14492: --- If this were true, that it doesn't work with 1.2.1 because of a NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 Yet you say 1.2.0 works while 1.2.1 doesn't. However given this detail I _do_ agree with you that it looks like there's a real problem here, though I'd imagine it manifests with 1.1.x and earlier. Is that what you meant? it's what your JIRA says but not your last comment. > Spark SQL 1.6.0 does not work with external Hive metastore version lower than > 1.2.0; its not backwards compatible with earlier version > -- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20212) UDFs with Option[Primitive Type] don't work as expected
Marius Feteanu created SPARK-20212: -- Summary: UDFs with Option[Primitive Type] don't work as expected Key: SPARK-20212 URL: https://issues.apache.org/jira/browse/SPARK-20212 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 2.1.0 Reporter: Marius Feteanu Priority: Minor The documenation for ScalaUDF says: {code:none} Note that if you use primitive parameters, you are not able to check if it is null or not, and the UDF will return null for you if the primitive input is null. Use boxed type or [[Option]] if you wanna do the null-handling yourself. {code} This works with boxed types: {code:none} import org.apache.spark.sql.functions.{col, udf} import spark.implicits._ def is_null_box(x:java.lang.Long):String = { x match { case _:java.lang.Long => "Yep" case null => "No man" } } val is_null_box_udf = udf(is_null_box _) val sample = (1L to 5L).toList.map(x=>new java.lang.Long(x))++List[java.lang.Long](null, null) val df = sample.toDF("col1") df.select(is_null_box_udf(col("col1"))).show(10) {code} But does not work with Option\[Long\] as expected: {code:none} import org.apache.spark.sql.functions.{col, udf} import spark.implicits._ def is_null_opt(x:Option[Long]):String = { x match { case Some(_:Long) => "Yep" case None => "No man" } } val is_null_opt_udf = udf(is_null_opt _) val sample = (1L to 5L) // This does not help: val sample = (1L to 5L).map(Some(_)).toList val df = sample.toDF("col1") df.select(is_null_opt_udf(col("col1"))).show(10) {code} That throws: {code:none} Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Option at $anonfun$1.apply(:56) at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:89) at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:88) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1069) {code} The current workaround is to use boxed types but it makes for code that looks funny. If you just use Long instead of boxing the code may break in subtle ways (i.e. it does not fail it returns null). That's documented but easy to miss (i.e. not part of the bug but if someone "corrects" boxed functions to use primitive types then they might get surprising results): {code:none} import org.apache.spark.sql.functions.{col, udf, expr} import spark.implicits._ def is_null_opt(x:Long):String = { Option(x) match { case Some(_:Long) => "Yep" case None => "No man" } } val is_null_opt_udf = udf(is_null_opt _) val sample = (1L to 5L) val df = sample.toDF("col3").select(expr("CASE WHEN col3=2 THEN NULL ELSE col3 END").alias("col3")) df.printSchema df.select(is_null_opt_udf(col("col3"))).show(10) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20161) Default log4j properties file should print thread-id in ConversionPattern
[ https://issues.apache.org/jira/browse/SPARK-20161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20161. --- Resolution: Won't Fix Target Version/s: (was: 2.2.0) > Default log4j properties file should print thread-id in ConversionPattern > - > > Key: SPARK-20161 > URL: https://issues.apache.org/jira/browse/SPARK-20161 > Project: Spark > Issue Type: Improvement > Components: Deploy, YARN >Affects Versions: 2.1.0 >Reporter: Sahil Takiar > > The default log4j file in {{spark/conf/log4j.properties.template}} doesn't > display the thread-id when printing out the logs. It would be very useful to > add this, especially for YARN. Currently, logs from all the different threads > in a single executor are sent to the same log file. This makes debugging > difficult as it is hard to filter out what logs come from what thread. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955510#comment-15955510 ] Sunil Rangwani commented on SPARK-14492: {quote}If this were true, that it doesn't work with 1.2.1 because of a NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? {quote} Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used an external metastore of an older version, that didnt work. It did not give me a different error I don't remember but it was not NoSuchFieldError {quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 Yet you say 1.2.0 works while 1.2.1 doesn't.{quote} All versions upwards of 1.2.0 does work fine. The minimum version of external metastore that works is 1.2.0 > Spark SQL 1.6.0 does not work with external Hive metastore version lower than > 1.2.0; its not backwards compatible with earlier version > -- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955510#comment-15955510 ] Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 5:55 PM: {quote}If this were true, that it doesn't work with 1.2.1 because of a NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? {quote} Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used an external metastore of an older version, that didnt work. It gives a different error I don't remember but it was not NoSuchFieldError {quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 Yet you say 1.2.0 works while 1.2.1 doesn't.{quote} All versions upwards of 1.2.0 does work fine. The minimum version of external metastore that works is 1.2.0 was (Author: sunil.rangwani): {quote}If this were true, that it doesn't work with 1.2.1 because of a NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? {quote} Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used an external metastore of an older version, that didnt work. It did not give me a different error I don't remember but it was not NoSuchFieldError {quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 Yet you say 1.2.0 works while 1.2.1 doesn't.{quote} All versions upwards of 1.2.0 does work fine. The minimum version of external metastore that works is 1.2.0 > Spark SQL 1.6.0 does not work with external Hive metastore version lower than > 1.2.0; its not backwards compatible with earlier version > -- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe,
[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955510#comment-15955510 ] Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 5:58 PM: {quote}If this were true, that it doesn't work with 1.2.1 because of a NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? {quote} Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used an external metastore of an older version, that didnt work. It gives a different error I don't remember but it was not NoSuchFieldError {quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 Yet you say 1.2.0 works while 1.2.1 doesn't.{quote} All versions upwards of 1.2.0 does work fine. The minimum version of external metastore that works is 1.2.0. All versions upto 1.2.0 give a NoSuchFieldError for different fields. was (Author: sunil.rangwani): {quote}If this were true, that it doesn't work with 1.2.1 because of a NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? {quote} Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used an external metastore of an older version, that didnt work. It gives a different error I don't remember but it was not NoSuchFieldError {quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 Yet you say 1.2.0 works while 1.2.1 doesn't.{quote} All versions upwards of 1.2.0 does work fine. The minimum version of external metastore that works is 1.2.0 > Spark SQL 1.6.0 does not work with external Hive metastore version lower than > 1.2.0; its not backwards compatible with earlier version > -- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) -
[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955510#comment-15955510 ] Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 6:00 PM: {quote}If this were true, that it doesn't work with 1.2.1 because of a NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? {quote} Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used an external metastore of an older version, that didnt work. It gives a different error I don't remember but it was not NoSuchFieldError {quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 Yet you say 1.2.0 works while 1.2.1 doesn't.{quote} All versions upwards of 1.2.0 does work fine. The minimum version of external metastore that works is 1.2.0 (I was just highlighting I didn't have to upgrade upto 1.2.1) All versions upto 1.2.0 give a NoSuchFieldError for different fields. was (Author: sunil.rangwani): {quote}If this were true, that it doesn't work with 1.2.1 because of a NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? {quote} Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used an external metastore of an older version, that didnt work. It gives a different error I don't remember but it was not NoSuchFieldError {quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 Yet you say 1.2.0 works while 1.2.1 doesn't.{quote} All versions upwards of 1.2.0 does work fine. The minimum version of external metastore that works is 1.2.0. All versions upto 1.2.0 give a NoSuchFieldError for different fields. > Spark SQL 1.6.0 does not work with external Hive metastore version lower than > 1.2.0; its not backwards compatible with earlier version > -- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy
[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955510#comment-15955510 ] Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 6:06 PM: {quote}If this were true, that it doesn't work with 1.2.1 because of a NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? {quote} Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used an external metastore of an older version, that didnt work. It gives a different error I don't remember but it was not NoSuchFieldError {quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 Yet you say 1.2.0 works while 1.2.1 doesn't.{quote} All versions upwards of 1.2.0 do work fine. The minimum version of external metastore that works is 1.2.0 (I was just highlighting I didn't have to upgrade upto 1.2.1) All versions upto 1.2.0 give a NoSuchFieldError for different fields. was (Author: sunil.rangwani): {quote}If this were true, that it doesn't work with 1.2.1 because of a NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? {quote} Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used an external metastore of an older version, that didnt work. It gives a different error I don't remember but it was not NoSuchFieldError {quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 Yet you say 1.2.0 works while 1.2.1 doesn't.{quote} All versions upwards of 1.2.0 does work fine. The minimum version of external metastore that works is 1.2.0 (I was just highlighting I didn't have to upgrade upto 1.2.1) All versions upto 1.2.0 give a NoSuchFieldError for different fields. > Spark SQL 1.6.0 does not work with external Hive metastore version lower than > 1.2.0; its not backwards compatible with earlier version > -- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.ma
[jira] [Comment Edited] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode
[ https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955528#comment-15955528 ] Ian Hummel edited comment on SPARK-5158 at 4/4/17 6:07 PM: --- At Bloomberg we've been working on a solution to this issue so we can access kerberized HDFS clusters from standalone Spark installations we run on our internal cloud infrastructure. Folks who are interested can try out a patch at https://github.com/themodernlife/spark/tree/spark-5158. It extends standalone mode to support configuration related to {{\-\-principal}} and {{\-\-keytab}}. The main changes are - Refactor {{ConfigurableCredentialManager}} and related {{CredentialProviders}} so that they are no longer tied to YARN - Setup credential renewal/updating from within the {{StandaloneSchedulerBackend}} - Ensure executors/drivers are able to find initial tokens for contacting HDFS and renew them at regular intervals The implementation does basically the same thing as the YARN backend. The keytab is copied to driver/executors through an environment variable in the {{ApplicationDescription}}. I might be wrong, but I'm assuming proper {{spark.authenticate}} setup would ensure it's encrypted over-the-wire (can anyone confirm?). Credentials on the executors and the driver (cluster mode) are written to disk as whatever user the Spark daemon runs as. Open to suggestions on whether it's worth tightening that up. Would appreciate any feedback from the community. was (Author: themodernlife): At Bloomberg we've been working on a solution to this issue so we can access kerberized HDFS clusters from standalone Spark installations we run on our internal cloud infrastructure. Folks who are interested can try out a patch at https://github.com/themodernlife/spark/tree/spark-5158. It extends standalone mode to support configuration related to {{--principal}} and {{--keytab}}. The main changes are - Refactor {{ConfigurableCredentialManager}} and related {{CredentialProviders}} so that they are no longer tied to YARN - Setup credential renewal/updating from within the {{StandaloneSchedulerBackend}} - Ensure executors/drivers are able to find initial tokens for contacting HDFS and renew them at regular intervals The implementation does basically the same thing as the YARN backend. The keytab is copied to driver/executors through an environment variable in the {{ApplicationDescription}}. I might be wrong, but I'm assuming proper {{spark.authenticate}} setup would ensure it's encrypted over-the-wire (can anyone confirm?). Credentials on the executors and the driver (cluster mode) are written to disk as whatever user the Spark daemon runs as. Open to suggestions on whether it's worth tightening that up. Would appreciate any feedback from the community. > Allow for keytab-based HDFS security in Standalone mode > --- > > Key: SPARK-5158 > URL: https://issues.apache.org/jira/browse/SPARK-5158 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Matthew Cheah >Priority: Critical > > There have been a handful of patches for allowing access to Kerberized HDFS > clusters in standalone mode. The main reason we haven't accepted these > patches have been that they rely on insecure distribution of token files from > the driver to the other components. > As a simpler solution, I wonder if we should just provide a way to have the > Spark driver and executors independently log in and acquire credentials using > a keytab. This would work for users who have a dedicated, single-tenant, > Spark clusters (i.e. they are willing to have a keytab on every machine > running Spark for their application). It wouldn't address all possible > deployment scenarios, but if it's simple I think it's worth considering. > This would also work for Spark streaming jobs, which often run on dedicated > hardware since they are long-running services. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode
[ https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955528#comment-15955528 ] Ian Hummel commented on SPARK-5158: --- At Bloomberg we've been working on a solution to this issue so we can access kerberized HDFS clusters from standalone Spark installations we run on our internal cloud infrastructure. Folks who are interested can try out a patch at https://github.com/themodernlife/spark/tree/spark-5158. It extends standalone mode to support configuration related to {{--principal}} and {{--keytab}}. The main changes are - Refactor {{ConfigurableCredentialManager}} and related {{CredentialProviders}} so that they are no longer tied to YARN - Setup credential renewal/updating from within the {{StandaloneSchedulerBackend}} - Ensure executors/drivers are able to find initial tokens for contacting HDFS and renew them at regular intervals The implementation does basically the same thing as the YARN backend. The keytab is copied to driver/executors through an environment variable in the {{ApplicationDescription}}. I might be wrong, but I'm assuming proper {{spark.authenticate}} setup would ensure it's encrypted over-the-wire (can anyone confirm?). Credentials on the executors and the driver (cluster mode) are written to disk as whatever user the Spark daemon runs as. Open to suggestions on whether it's worth tightening that up. Would appreciate any feedback from the community. > Allow for keytab-based HDFS security in Standalone mode > --- > > Key: SPARK-5158 > URL: https://issues.apache.org/jira/browse/SPARK-5158 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Matthew Cheah >Priority: Critical > > There have been a handful of patches for allowing access to Kerberized HDFS > clusters in standalone mode. The main reason we haven't accepted these > patches have been that they rely on insecure distribution of token files from > the driver to the other components. > As a simpler solution, I wonder if we should just provide a way to have the > Spark driver and executors independently log in and acquire credentials using > a keytab. This would work for users who have a dedicated, single-tenant, > Spark clusters (i.e. they are willing to have a keytab on every machine > running Spark for their application). It wouldn't address all possible > deployment scenarios, but if it's simple I think it's worth considering. > This would also work for Spark streaming jobs, which often run on dedicated > hardware since they are long-running services. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10364) Support Parquet logical type TIMESTAMP_MILLIS
[ https://issues.apache.org/jira/browse/SPARK-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-10364: --- Assignee: Dilip Biswal > Support Parquet logical type TIMESTAMP_MILLIS > - > > Key: SPARK-10364 > URL: https://issues.apache.org/jira/browse/SPARK-10364 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.5.0 >Reporter: Cheng Lian >Assignee: Dilip Biswal > Fix For: 2.2.0 > > > The {{TimestampType}} in Spark SQL is of microsecond precision. Ideally, we > should convert Spark SQL timestamp values into Parquet {{TIMESTAMP_MICROS}}. > But unfortunately parquet-mr hasn't supported it yet. > For the read path, we should be able to read {{TIMESTAMP_MILLIS}} Parquet > values and pad a 0 microsecond part to read values. > For the write path, currently we are writing timestamps as {{INT96}}, similar > to Impala and Hive. One alternative is that, we can have a separate SQL > option to let users be able to write Spark SQL timestamp values as > {{TIMESTAMP_MILLIS}}. Of course, in this way the microsecond part will be > truncated. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode
[ https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955564#comment-15955564 ] Apache Spark commented on SPARK-5158: - User 'themodernlife' has created a pull request for this issue: https://github.com/apache/spark/pull/17530 > Allow for keytab-based HDFS security in Standalone mode > --- > > Key: SPARK-5158 > URL: https://issues.apache.org/jira/browse/SPARK-5158 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Matthew Cheah >Priority: Critical > > There have been a handful of patches for allowing access to Kerberized HDFS > clusters in standalone mode. The main reason we haven't accepted these > patches have been that they rely on insecure distribution of token files from > the driver to the other components. > As a simpler solution, I wonder if we should just provide a way to have the > Spark driver and executors independently log in and acquire credentials using > a keytab. This would work for users who have a dedicated, single-tenant, > Spark clusters (i.e. they are willing to have a keytab on every machine > running Spark for their application). It wouldn't address all possible > deployment scenarios, but if it's simple I think it's worth considering. > This would also work for Spark streaming jobs, which often run on dedicated > hardware since they are long-running services. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20191) RackResolver not correctly being overridden in YARN tests
[ https://issues.apache.org/jira/browse/SPARK-20191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-20191. Resolution: Fixed Assignee: Marcelo Vanzin Fix Version/s: 2.1.2 2.2.0 > RackResolver not correctly being overridden in YARN tests > - > > Key: SPARK-20191 > URL: https://issues.apache.org/jira/browse/SPARK-20191 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 2.0.3, 2.1.1, 2.2.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 2.2.0, 2.1.2 > > > YARN tests currently try to override YARN's RackResolver, but that class > self-initializes the first time it's called, storing state in static > variables and ignoring any further config params that might override the > initial behavior. > So we need a better solution for Spark tests, so that tests such as > {{LocalityPlacementStrategySuite}} don't flood the DNS server with requests > (making the test really slow in certain environments). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20204) remove SimpleCatalystConf and CatalystConf type alias
[ https://issues.apache.org/jira/browse/SPARK-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-20204. - Resolution: Fixed Fix Version/s: 2.2.0 > remove SimpleCatalystConf and CatalystConf type alias > - > > Key: SPARK-20204 > URL: https://issues.apache.org/jira/browse/SPARK-20204 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20153) Support Multiple aws credentials in order to access multiple Hive on S3 table in spark application
[ https://issues.apache.org/jira/browse/SPARK-20153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955695#comment-15955695 ] Franck Tago commented on SPARK-20153: - oh I would definitely not consider including the key into the URL . It is a gigantic security hole in my opinion. Moreover consider that I am dealing with Hive on S3 where the uri is part of the table metadata. How would that work in this case. Is there a way to encode the accesId and secret key before calling Does spark provide anyway of masking or hiding the accessId and secretKey? > Support Multiple aws credentials in order to access multiple Hive on S3 table > in spark application > --- > > Key: SPARK-20153 > URL: https://issues.apache.org/jira/browse/SPARK-20153 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.1, 2.1.0 >Reporter: Franck Tago >Priority: Minor > > I need to access multiple hive tables in my spark application where each hive > table is > 1- an external table with data sitting on S3 > 2- each table is own by a different AWS user so I need to provide different > AWS credentials. > I am familiar with setting the aws credentials in the hadoop configuration > object but that does not really help me because I can only set one pair of > (fs.s3a.awsAccessKeyId , fs.s3a.awsSecretAccessKey ) > From my research , there is no easy or elegant way to do this in spark . > Why is that ? > How do I address this use case? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955705#comment-15955705 ] yuhao yang commented on SPARK-20203: [~Syrux] Since you got some experiences using the PrefixSpan, I'd like to have your input (or better contribution) in https://issues.apache.org/jira/browse/SPARK-20114 . > Change default maxPatternLength value to Int.MaxValue in PrefixSpan > --- > > Key: SPARK-20203 > URL: https://issues.apache.org/jira/browse/SPARK-20203 > Project: Spark > Issue Type: Wish > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Trivial > Original Estimate: 0h > Remaining Estimate: 0h > > I think changing the default value to Int.MaxValue would be more user > friendly. At least for new users. > Personally, when I run an algorithm, I expect it to find all solution by > default. And a limited number of them, when I set the parameters to do so. > The current implementation limit the length of solution patterns to 10. > Thus preventing all solution to be printed when running slightly large > datasets. > I feel like that should be changed, but since this would change the default > behavior of PrefixSpan. I think asking for the communities opinion should > come first. So, what do you think ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17495) Hive hash implementation
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955714#comment-15955714 ] Dominic Ricard commented on SPARK-17495: [~tejasp] We use murmur3 hash internally in some of our data pipelines, non-SQL, and I would like to know if the goal of this task to expose a new UDF (ex: murmur3()), similar to md5() and hash()? I believe that would be the best approach to preserve compatibility with previously generated data and queries using hash(). > Hive hash implementation > > > Key: SPARK-17495 > URL: https://issues.apache.org/jira/browse/SPARK-17495 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Tejas Patil >Assignee: Tejas Patil >Priority: Minor > Fix For: 2.2.0 > > > Spark internally uses Murmur3Hash for partitioning. This is different from > the one used by Hive. For queries which use bucketing this leads to different > results if one tries the same query on both engines. For us, we want users to > have backward compatibility to that one can switch parts of applications > across the engines without observing regressions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x
[ https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955747#comment-15955747 ] Prasanth commented on SPARK-18737: -- We are using 2.0.1. Can you tell us how to "disable the Kryo auto-pick for streaming from the Java API" as a workaround? > Serialization setting "spark.serializer" ignored in Spark 2.x > - > > Key: SPARK-18737 > URL: https://issues.apache.org/jira/browse/SPARK-18737 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0, 2.0.1 >Reporter: Dr. Michael Menzel > > The following exception occurs although the JavaSerializer has been activated: > 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID > 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, > 5621 bytes) > 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 77 on executor id: 2 hostname: > ip-10-121-14-147.eu-central-1.compute.internal. > 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory > on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: > 410.4 MB) > 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, > ip-10-121-14-147.eu-central-1.compute.internal): > com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: > 13994 > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781) > at > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229) > at > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at org.apache.spark.util.NextIterator.to(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now > 2.0.1, we see the Kyro deserialization exception and over time the Spark > streaming job stops processing since too many tasks failed. > Our action was to use conf.set("spark.serializer", > "org.apache.spark.serializer.JavaSerializer") and to disable Kryo class > registration with conf.set("spark.kryo.registrationRequired", false). We hope > to identify the root cause of the exception. > However, setting the serializer to JavaSerializer is oviously ignored by the > Spark-internals. Despite the setting we still see the exception printed in > the log and tasks fail. The occurence seems to be non-deterministic, but to > become more frequent over time. > Several questions we could not answer during our troubleshooting: > 1. How can the debug log for Kryo be enabled? -- We tried following the > minilog documentation, but no output can be found. > 2. Is the serializer setting effective for Spark internal serializat
[jira] [Created] (SPARK-20213) DataFrameWriter operations do not show up in SQL tab
Ryan Blue created SPARK-20213: - Summary: DataFrameWriter operations do not show up in SQL tab Key: SPARK-20213 URL: https://issues.apache.org/jira/browse/SPARK-20213 Project: Spark Issue Type: Bug Components: SQL, Web UI Affects Versions: 2.1.0, 2.0.2 Reporter: Ryan Blue In 1.6.1, {{DataFrame}} writes started using {{DataFrameWriter}} actions like {{insertInto}} would show up in the SQL tab. In 2.0.0 and later, they no longer do. The problem is that 2.0.0 and later no longer wrap execution with {{SQLExecution.withNewExecutionId}}, which emits {{SparkListenerSQLExecutionStart}}. Here are the relevant parts of the stack traces: {code:title=Spark 1.6.1} org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) org.apache.spark.sql.execution.QueryExecution$$anonfun$toRdd$1.apply(QueryExecution.scala:56) org.apache.spark.sql.execution.QueryExecution$$anonfun$toRdd$1.apply(QueryExecution.scala:56) org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53) org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:56) => holding Monitor(org.apache.spark.sql.hive.HiveContext$QueryExecution@424773807}) org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:196) {code} {code:title=Spark 2.0.0} org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86) => holding Monitor(org.apache.spark.sql.execution.QueryExecution@490977924}) org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86) org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:301) {code} I think this was introduced by [54d23599|https://github.com/apache/spark/commit/54d23599]. The fix should be to add withNewExecutionId to https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L610 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20214) pyspark.mllib SciPyTests test_serialize
Joseph K. Bradley created SPARK-20214: - Summary: pyspark.mllib SciPyTests test_serialize Key: SPARK-20214 URL: https://issues.apache.org/jira/browse/SPARK-20214 Project: Spark Issue Type: Bug Components: ML, MLlib, PySpark, Tests Affects Versions: 2.0.2, 2.1.1, 2.2.0 Reporter: Joseph K. Bradley I've seen a few failures of this line: https://github.com/apache/spark/blame/402bf2a50ddd4039ff9f376b641bd18fffa54171/python/pyspark/mllib/tests.py#L847 It converts a scipy.sparse.lil_matrix to a dok_matrix and then to a pyspark.mllib.linalg.Vector. The failure happens in the conversion to a vector and indicates that the dok_matrix is not returning its values in sorted order. (Actually, the failure is in _convert_to_vector, which converts the dok_matrix to a csc_matrix and then passes the CSC data to the MLlib Vector constructor.) Here's the stack trace: {code} Traceback (most recent call last): File "/home/jenkins/workspace/python/pyspark/mllib/tests.py", line 847, in test_serialize self.assertEqual(sv, _convert_to_vector(lil.todok())) File "/home/jenkins/workspace/python/pyspark/mllib/linalg/__init__.py", line 78, in _convert_to_vector return SparseVector(l.shape[0], csc.indices, csc.data) File "/home/jenkins/workspace/python/pyspark/mllib/linalg/__init__.py", line 556, in __init__ % (self.indices[i], self.indices[i + 1])) TypeError: Indices 3 and 1 are not strictly increasing {code} This seems like a bug in _convert_to_vector, where we really should check {{csc_matrix.has_sorted_indices}} first. I haven't seen this bug in pyspark.ml.linalg, but it probably exists there too. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20153) Support Multiple aws credentials in order to access multiple Hive on S3 table in spark application
[ https://issues.apache.org/jira/browse/SPARK-20153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955956#comment-15955956 ] Steve Loughran commented on SPARK-20153: I'm glad we are both in agreement about not using secrets in URLs. I'm afraid then, there's not much that can be done without upgrading to Hadoop 2.8.x JARs. You'll get a lot of other S3A speedups too, so it's worth upgrading for S3 IO performance as well as security. > Support Multiple aws credentials in order to access multiple Hive on S3 table > in spark application > --- > > Key: SPARK-20153 > URL: https://issues.apache.org/jira/browse/SPARK-20153 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.1, 2.1.0 >Reporter: Franck Tago >Priority: Minor > > I need to access multiple hive tables in my spark application where each hive > table is > 1- an external table with data sitting on S3 > 2- each table is own by a different AWS user so I need to provide different > AWS credentials. > I am familiar with setting the aws credentials in the hadoop configuration > object but that does not really help me because I can only set one pair of > (fs.s3a.awsAccessKeyId , fs.s3a.awsSecretAccessKey ) > From my research , there is no easy or elegant way to do this in spark . > Why is that ? > How do I address this use case? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20215) ReuseExchange is boken in SparkSQL
Zhan Zhang created SPARK-20215: -- Summary: ReuseExchange is boken in SparkSQL Key: SPARK-20215 URL: https://issues.apache.org/jira/browse/SPARK-20215 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: Zhan Zhang Priority: Minor Currently if we have query like: A join B Union A join C... with the same join key. Table A will be scanned multiple times in sql. It is because the megastoreRelation are not shared by two joins, and ExprId is different. canonicalized in Expression will not be able to unify them and two Exchange will not compatible and cannot be reused. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955510#comment-15955510 ] Sunil Rangwani edited comment on SPARK-14492 at 4/4/17 10:42 PM: - {quote}If this were true, that it doesn't work with 1.2.1 because of a NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? {quote} To clarify, I meant that if I left the value of the property as 1.2.1 but used an external metastore of an older version, that doesn't work. It gives a different error but not NoSuchFieldError {quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 Yet you say 1.2.0 works while 1.2.1 doesn't.{quote} All versions upwards of 1.2.0 do work fine. The minimum version of external metastore that works is 1.2.0 (I was just highlighting I didn't have to upgrade upto 1.2.1) All versions upto 1.2.0 give a NoSuchFieldError for different fields. was (Author: sunil.rangwani): {quote}If this were true, that it doesn't work with 1.2.1 because of a NoSuchFieldError, then it would not have compiled vs Hive 1.2.1 even. Right? {quote} Sorry, to clarify I meant if I left the value of the property as 1.2.1 but used an external metastore of an older version, that didnt work. It gives a different error I don't remember but it was not NoSuchFieldError {quote}The field you show was added in 1.2.0, according to the JIRA, not 1.2.1 Yet you say 1.2.0 works while 1.2.1 doesn't.{quote} All versions upwards of 1.2.0 do work fine. The minimum version of external metastore that works is 1.2.0 (I was just highlighting I didn't have to upgrade upto 1.2.1) All versions upto 1.2.0 give a NoSuchFieldError for different fields. > Spark SQL 1.6.0 does not work with external Hive metastore version lower than > 1.2.0; its not backwards compatible with earlier version > -- > > Key: SPARK-14492 > URL: https://issues.apache.org/jira/browse/SPARK-14492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Sunil Rangwani >Priority: Critical > > Spark SQL when configured with a Hive version lower than 1.2.0 throws a > java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME > because this field was introduced in Hive 1.2.0 so its not possible to use > Hive metastore version lower than 1.2.0 with Spark. The details of the Hive > changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 > {code:java} > Exception in thread "main" java.lang.NoSuchFieldError: > METASTORE_CLIENT_SOCKET_LIFETIME > at > org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at > org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) > at > org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:271) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:12
[jira] [Created] (SPARK-20216) Install pandoc on machine(s) used for packaging
holdenk created SPARK-20216: --- Summary: Install pandoc on machine(s) used for packaging Key: SPARK-20216 URL: https://issues.apache.org/jira/browse/SPARK-20216 Project: Spark Issue Type: Bug Components: Project Infra, PySpark Affects Versions: 2.1.1, 2.2.0 Reporter: holdenk Priority: Blocker For Python packaging having pandoc is required to have a reasonable package doc string. Which ever machine(s) are used for packaging should have both pandoc and pypandoc installed on them. cc [~joshrosen] who I know was doing something related to this -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956030#comment-15956030 ] Li Jin commented on SPARK-20144: > When you save the sorted data into Parquet, only the data in individual > Parquet file can maintain the data ordering This is something changed in Spark 2 as well? write.parquet doesn't write parquet files in the order of partitions any more? > spark.read.parquet no long maintains ordering of the data > - > > Key: SPARK-20144 > URL: https://issues.apache.org/jira/browse/SPARK-20144 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Li Jin > > Hi, We are trying to upgrade Spark from 1.6.3 to 2.0.2. One issue we found is > when we read parquet files in 2.0.2, the ordering of rows in the resulting > dataframe is not the same as the ordering of rows in the dataframe that the > parquet file was reproduced with. > This is because FileSourceStrategy.scala combines the parquet files into > fewer partitions and also reordered them. This breaks our workflows because > they assume the ordering of the data. > Is this considered a bug? Also FileSourceStrategy and FileSourceScanExec > changed quite a bit from 2.0.2 to 2.1, so not sure if this is an issue with > 2.1. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20216) Install pandoc on machine(s) used for packaging
[ https://issues.apache.org/jira/browse/SPARK-20216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956045#comment-15956045 ] Michael Armbrust commented on SPARK-20216: -- I think it all runs on https://amplab.cs.berkeley.edu/jenkins/computer/amp-jenkins-worker-01/ > Install pandoc on machine(s) used for packaging > --- > > Key: SPARK-20216 > URL: https://issues.apache.org/jira/browse/SPARK-20216 > Project: Spark > Issue Type: Bug > Components: Project Infra, PySpark >Affects Versions: 2.1.1, 2.2.0 >Reporter: holdenk >Priority: Blocker > > For Python packaging having pandoc is required to have a reasonable package > doc string. Which ever machine(s) are used for packaging should have both > pandoc and pypandoc installed on them. > cc [~joshrosen] who I know was doing something related to this -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array
[ https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-19716: -- Assignee: Wenchen Fan > Dataset should allow by-name resolution for struct type elements in array > - > > Key: SPARK-19716 > URL: https://issues.apache.org/jira/browse/SPARK-19716 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > > if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it > to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will > extract the `a` and `c` columns to build the Data. > However, if the struct is inside array, e.g. schema is {{arr: array int, b: int, c: int>>}}, and we wanna convert it to Dataset with {{case class > ComplexData(arr: Seq[Data])}}, we will fail. The reason is, to allow > compatible types, e.g. convert {{a: int}} to {{case class A(a: Long)}}, we > will add cast for each field, except struct type field, because struct type > is flexible, the number of columns can mismatch. We should probably also skip > cast for array and map type. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array
[ https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-19716. Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 17398 [https://github.com/apache/spark/pull/17398] > Dataset should allow by-name resolution for struct type elements in array > - > > Key: SPARK-19716 > URL: https://issues.apache.org/jira/browse/SPARK-19716 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.3.0 > > > if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it > to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will > extract the `a` and `c` columns to build the Data. > However, if the struct is inside array, e.g. schema is {{arr: array int, b: int, c: int>>}}, and we wanna convert it to Dataset with {{case class > ComplexData(arr: Seq[Data])}}, we will fail. The reason is, to allow > compatible types, e.g. convert {{a: int}} to {{case class A(a: Long)}}, we > will add cast for each field, except struct type field, because struct type > is flexible, the number of columns can mismatch. We should probably also skip > cast for array and map type. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array
[ https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19716: --- Fix Version/s: (was: 2.3.0) 2.2.0 > Dataset should allow by-name resolution for struct type elements in array > - > > Key: SPARK-19716 > URL: https://issues.apache.org/jira/browse/SPARK-19716 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.2.0 > > > if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it > to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will > extract the `a` and `c` columns to build the Data. > However, if the struct is inside array, e.g. schema is {{arr: array int, b: int, c: int>>}}, and we wanna convert it to Dataset with {{case class > ComplexData(arr: Seq[Data])}}, we will fail. The reason is, to allow > compatible types, e.g. convert {{a: int}} to {{case class A(a: Long)}}, we > will add cast for each field, except struct type field, because struct type > is flexible, the number of columns can mismatch. We should probably also skip > cast for array and map type. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception
Eric Liang created SPARK-20217: -- Summary: Executor should not fail stage if killed task throws non-interrupted exception Key: SPARK-20217 URL: https://issues.apache.org/jira/browse/SPARK-20217 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: Eric Liang This is reproducible as follows. Run the following, and then use SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will fail since we threw a RuntimeException instead of InterruptedException. We should probably unconditionally return TaskKilled instead of TaskFailed if the task was killed by the driver, regardless of the actual exception thrown. {code} spark.range(100).repartition(100).foreach { i => try { Thread.sleep(1000) } catch { case t: InterruptedException => throw new RuntimeException(t) } } {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception
[ https://issues.apache.org/jira/browse/SPARK-20217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Liang updated SPARK-20217: --- Description: This is reproducible as follows. Run the following, and then use SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will fail since we threw a RuntimeException instead of InterruptedException. We should probably unconditionally return TaskKilled instead of TaskFailed if the task was killed by the driver, regardless of the actual exception thrown. {code} spark.range(100).repartition(100).foreach { i => try { Thread.sleep(1000) } catch { case t: InterruptedException => throw new RuntimeException(t) } } {code} Based on the code in TaskSetManager, I think this also affects kills of speculative tasks. However, since the number of speculated tasks is few, and usually you need to fail a task a few times before the stage is cancelled, probably no-one noticed this in production. was: This is reproducible as follows. Run the following, and then use SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will fail since we threw a RuntimeException instead of InterruptedException. We should probably unconditionally return TaskKilled instead of TaskFailed if the task was killed by the driver, regardless of the actual exception thrown. {code} spark.range(100).repartition(100).foreach { i => try { Thread.sleep(1000) } catch { case t: InterruptedException => throw new RuntimeException(t) } } {code} > Executor should not fail stage if killed task throws non-interrupted exception > -- > > Key: SPARK-20217 > URL: https://issues.apache.org/jira/browse/SPARK-20217 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Eric Liang > > This is reproducible as follows. Run the following, and then use > SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will > fail since we threw a RuntimeException instead of InterruptedException. > We should probably unconditionally return TaskKilled instead of TaskFailed if > the task was killed by the driver, regardless of the actual exception thrown. > {code} > spark.range(100).repartition(100).foreach { i => > try { > Thread.sleep(1000) > } catch { > case t: InterruptedException => > throw new RuntimeException(t) > } > } > {code} > Based on the code in TaskSetManager, I think this also affects kills of > speculative tasks. However, since the number of speculated tasks is few, and > usually you need to fail a task a few times before the stage is cancelled, > probably no-one noticed this in production. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20183) Add outlierRatio option to testOutliersWithSmallWeights
[ https://issues.apache.org/jira/browse/SPARK-20183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-20183. --- Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17501 [https://github.com/apache/spark/pull/17501] > Add outlierRatio option to testOutliersWithSmallWeights > --- > > Key: SPARK-20183 > URL: https://issues.apache.org/jira/browse/SPARK-20183 > Project: Spark > Issue Type: Sub-task > Components: ML, Tests >Affects Versions: 2.1.0 >Reporter: Joseph K. Bradley >Assignee: Seth Hendrickson > Fix For: 2.2.0 > > > Part 1 of parent PR: Add flexibility to testOutliersWithSmallWeights test. > See https://github.com/apache/spark/pull/16722 for perspective. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception
[ https://issues.apache.org/jira/browse/SPARK-20217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20217: Assignee: Apache Spark > Executor should not fail stage if killed task throws non-interrupted exception > -- > > Key: SPARK-20217 > URL: https://issues.apache.org/jira/browse/SPARK-20217 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Eric Liang >Assignee: Apache Spark > > This is reproducible as follows. Run the following, and then use > SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will > fail since we threw a RuntimeException instead of InterruptedException. > We should probably unconditionally return TaskKilled instead of TaskFailed if > the task was killed by the driver, regardless of the actual exception thrown. > {code} > spark.range(100).repartition(100).foreach { i => > try { > Thread.sleep(1000) > } catch { > case t: InterruptedException => > throw new RuntimeException(t) > } > } > {code} > Based on the code in TaskSetManager, I think this also affects kills of > speculative tasks. However, since the number of speculated tasks is few, and > usually you need to fail a task a few times before the stage is cancelled, > probably no-one noticed this in production. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception
[ https://issues.apache.org/jira/browse/SPARK-20217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956093#comment-15956093 ] Apache Spark commented on SPARK-20217: -- User 'ericl' has created a pull request for this issue: https://github.com/apache/spark/pull/17531 > Executor should not fail stage if killed task throws non-interrupted exception > -- > > Key: SPARK-20217 > URL: https://issues.apache.org/jira/browse/SPARK-20217 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Eric Liang > > This is reproducible as follows. Run the following, and then use > SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will > fail since we threw a RuntimeException instead of InterruptedException. > We should probably unconditionally return TaskKilled instead of TaskFailed if > the task was killed by the driver, regardless of the actual exception thrown. > {code} > spark.range(100).repartition(100).foreach { i => > try { > Thread.sleep(1000) > } catch { > case t: InterruptedException => > throw new RuntimeException(t) > } > } > {code} > Based on the code in TaskSetManager, I think this also affects kills of > speculative tasks. However, since the number of speculated tasks is few, and > usually you need to fail a task a few times before the stage is cancelled, > probably no-one noticed this in production. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception
[ https://issues.apache.org/jira/browse/SPARK-20217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20217: Assignee: (was: Apache Spark) > Executor should not fail stage if killed task throws non-interrupted exception > -- > > Key: SPARK-20217 > URL: https://issues.apache.org/jira/browse/SPARK-20217 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Eric Liang > > This is reproducible as follows. Run the following, and then use > SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will > fail since we threw a RuntimeException instead of InterruptedException. > We should probably unconditionally return TaskKilled instead of TaskFailed if > the task was killed by the driver, regardless of the actual exception thrown. > {code} > spark.range(100).repartition(100).foreach { i => > try { > Thread.sleep(1000) > } catch { > case t: InterruptedException => > throw new RuntimeException(t) > } > } > {code} > Based on the code in TaskSetManager, I think this also affects kills of > speculative tasks. However, since the number of speculated tasks is few, and > usually you need to fail a task a few times before the stage is cancelled, > probably no-one noticed this in production. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20003) FPGrowthModel setMinConfidence should affect rules generation and transform
[ https://issues.apache.org/jira/browse/SPARK-20003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-20003. --- Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17336 [https://github.com/apache/spark/pull/17336] > FPGrowthModel setMinConfidence should affect rules generation and transform > --- > > Key: SPARK-20003 > URL: https://issues.apache.org/jira/browse/SPARK-20003 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.2.0 >Reporter: yuhao yang >Assignee: yuhao yang >Priority: Minor > Fix For: 2.2.0 > > > I was doing some test and find the issue. FPGrowthModel setMinConfidence > should affect rules generation and transform. > Currently associationRules in FPGrowthModel is a lazy val and > setMinConfidence in FPGrowthModel has no impact once associationRules got > computed . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20207) Add ablity to exclude current row in WindowSpec
[ https://issues.apache.org/jira/browse/SPARK-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956134#comment-15956134 ] Satyajit varma commented on SPARK-20207: I can take up this ticket, will keep updating on the progress as this is my first attempt jumping into the Spark SQL codebase. > Add ablity to exclude current row in WindowSpec > --- > > Key: SPARK-20207 > URL: https://issues.apache.org/jira/browse/SPARK-20207 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Mathew Wicks >Priority: Minor > > It would be useful if we could implement a way to exclude the current row in > WindowSpec. (We can currently only select ranges of rows/time.) > Currently, users have to resort to ridiculous measures to exclude the current > row from windowing aggregations. > As seen here: > http://stackoverflow.com/questions/43180723/spark-sql-excluding-the-current-row-in-partition-by-windowing-functions/43198839#43198839 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. whole stage codegen on(spark.sql.codegen.wholeStage = true):40s whole stage codege off(spark.sql.codegen.wholeStage = false):6s After some analysis i think this is related to the huge java method(a java method of thousand lines) which generated by codegen. And If i config -XX:-DontCompileHugeMethods the performance get much better(about 7s). was: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. codegen on:40s codegen off:6s After some analysis i think this is related to the huge java method(a java method of thousand lines) which generated by codegen. And If i config -XX:-DontCompileHugeMethods the performance get much better(about 7s). > performance regression for complex/long sql when enable codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > whole stage codegen on(spark.sql.codegen.wholeStage = true):40s > whole stage codege off(spark.sql.codegen.wholeStage = false):6s > After some analysis i think this is related to the huge java method(a java > method of thousand lines) which generated by codegen. > And If i config -XX:-DontCompileHugeMethods the performance get much > better(about 7s). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. whole stage codegen on(spark.sql.codegen.wholeStage = true):40s whole stage codegen off(spark.sql.codegen.wholeStage = false):6s After some analysis i think this is related to the huge java method(a java method of thousand lines) which generated by codegen. And If i config -XX:-DontCompileHugeMethods the performance get much better(about 7s). was: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. whole stage codegen on(spark.sql.codegen.wholeStage = true):40s whole stage codege off(spark.sql.codegen.wholeStage = false):6s After some analysis i think this is related to the huge java method(a java method of thousand lines) which generated by codegen. And If i config -XX:-DontCompileHugeMethods the performance get much better(about 7s). > performance regression for complex/long sql when enable codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > whole stage codegen on(spark.sql.codegen.wholeStage = true):40s > whole stage codegen off(spark.sql.codegen.wholeStage = false):6s > After some analysis i think this is related to the huge java method(a java > method of thousand lines) which generated by codegen. > And If i config -XX:-DontCompileHugeMethods the performance get much > better(about 7s). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable whole stage codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Summary: performance regression for complex/long sql when enable whole stage codegen (was: performance regression for complex/long sql when enable codegen) > performance regression for complex/long sql when enable whole stage codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > whole stage codegen on(spark.sql.codegen.wholeStage = true):40s > whole stage codegen off(spark.sql.codegen.wholeStage = false):6s > After some analysis i think this is related to the huge java method(a java > method of thousand lines) which generated by codegen. > And If i config -XX:-DontCompileHugeMethods the performance get much > better(about 7s). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18971) Netty issue may cause the shuffle client hang
[ https://issues.apache.org/jira/browse/SPARK-18971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956209#comment-15956209 ] Shixiong Zhu commented on SPARK-18971: -- It's not backported because there are two many changes in the new Netty version and it's too risky to upgrade in a maintenance release.. We may backport it to 2.1 if we don't see any regression after Spark 2.2.0 is released. > Netty issue may cause the shuffle client hang > - > > Key: SPARK-18971 > URL: https://issues.apache.org/jira/browse/SPARK-18971 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > Fix For: 2.2.0 > > > Check https://github.com/netty/netty/issues/6153 for details > You should be able to see the following similar stack track in the executor > thread dump. > {code} > "shuffle-client-7-4" daemon prio=5 tid=97 RUNNABLE > at io.netty.util.Recycler$Stack.scavengeSome(Recycler.java:504) > at io.netty.util.Recycler$Stack.scavenge(Recycler.java:454) > at io.netty.util.Recycler$Stack.pop(Recycler.java:435) > at io.netty.util.Recycler.get(Recycler.java:144) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.newInstance(PooledUnsafeDirectByteBuf.java:39) > at > io.netty.buffer.PoolArena$DirectArena.newByteBuf(PoolArena.java:727) > at io.netty.buffer.PoolArena.allocate(PoolArena.java:140) > at > io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168) > at > io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:129) > at > io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:652) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20218) '/applications/[app-id]/stages' in REST API,add description.
guoxiaolongzte created SPARK-20218: -- Summary: '/applications/[app-id]/stages' in REST API,add description. Key: SPARK-20218 URL: https://issues.apache.org/jira/browse/SPARK-20218 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 2.1.0 Reporter: guoxiaolongzte Priority: Minor '/applications/[app-id]/stages' in rest api.status should add description '?status=[active|complete|pending|failed] list only stages in the state.' Now the lack of this description, resulting in the use of this api do not know the use of the status through the brush stage list. code: @GET def stageList(@QueryParam("status") statuses: JList[StageStatus]): Seq[StageData] = { val listener = ui.jobProgressListener val stageAndStatus = AllStagesResource.stagesAndStatus(ui) val adjStatuses = { if (statuses.isEmpty()) { Arrays.asList(StageStatus.values(): _*) } else { statuses } }; -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20135) spark thriftserver2: no job running but containers not release on yarn
[ https://issues.apache.org/jira/browse/SPARK-20135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956235#comment-15956235 ] bruce xu commented on SPARK-20135: -- OK, Thanks. > spark thriftserver2: no job running but containers not release on yarn > -- > > Key: SPARK-20135 > URL: https://issues.apache.org/jira/browse/SPARK-20135 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 > Environment: spark 2.0.1 with hadoop 2.6.0 >Reporter: bruce xu > Attachments: 0329-1.png, 0329-2.png, 0329-3.png > > > i opened the executor dynamic allocation feature, however it doesn't work > sometimes. > i set the initial executor num 50, after job finished the cores and mem > resource did not release. > from the spark web UI, the active job/running task/stage num is 0 , but the > executors page show cores 1276, active task 7288. > from the yarn web UI, the thriftserver job's running containers is 639 > without releasing. > this may be a bug. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org