[jira] [Commented] (SPARK-16007) Empty DataFrame created with spark.read.csv() does not respect user specified schema
[ https://issues.apache.org/jira/browse/SPARK-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363825#comment-15363825 ] Reynold Xin commented on SPARK-16007: - Is this done yet? > Empty DataFrame created with spark.read.csv() does not respect user specified > schema > > > Key: SPARK-16007 > URL: https://issues.apache.org/jira/browse/SPARK-16007 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Minor > > {{spark.schema(someSchema).csv().schema != someSchema}} > The schema of the empty DF created with {{csv()}} has no fields. > This problem will occur for json, text, parquet, orc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16007) Empty DataFrame created with spark.read.csv() does not respect user specified schema
[ https://issues.apache.org/jira/browse/SPARK-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-16007: Target Version/s: 2.1.0 (was: 2.0.0) > Empty DataFrame created with spark.read.csv() does not respect user specified > schema > > > Key: SPARK-16007 > URL: https://issues.apache.org/jira/browse/SPARK-16007 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Minor > > {{spark.schema(someSchema).csv().schema != someSchema}} > The schema of the empty DF created with {{csv()}} has no fields. > This problem will occur for json, text, parquet, orc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16387) Reserved SQL words are not escaped by JDBC writer
[ https://issues.apache.org/jira/browse/SPARK-16387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363817#comment-15363817 ] Dongjoon Hyun commented on SPARK-16387: --- Then, could you make a PR for this? > Reserved SQL words are not escaped by JDBC writer > - > > Key: SPARK-16387 > URL: https://issues.apache.org/jira/browse/SPARK-16387 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Lev > > Here is a code (imports are omitted) > object Main extends App { > val sqlSession = SparkSession.builder().config(new SparkConf(). > setAppName("Sql Test").set("spark.app.id", "SQLTest"). > set("spark.master", "local[2]"). > set("spark.ui.enabled", "false") > .setJars(Seq("/mysql/mysql-connector-java-5.1.38.jar" )) > ).getOrCreate() > import sqlSession.implicits._ > val localprops = new Properties > localprops.put("user", "") > localprops.put("password", "") > val df = sqlSession.createDataset(Seq("a","b","c")).toDF("order") > val writer = df.write > .mode(SaveMode.Append) > writer > .jdbc("jdbc:mysql://localhost:3306/test3", s"jira_test", localprops) > } > End error is : > com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error > in your SQL syntax; check the manual that corresponds to your MySQL server > version for the right syntax to use near 'order TEXT )' at line 1 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > Clearly the reserved word has to be quoted -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16387) Reserved SQL words are not escaped by JDBC writer
[ https://issues.apache.org/jira/browse/SPARK-16387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363802#comment-15363802 ] Lev commented on SPARK-16387: - JdbcDialect class has a functionality that allows DB-dependent quotation. Please note that quotation has to be applied to all SQL statement generation. code > Reserved SQL words are not escaped by JDBC writer > - > > Key: SPARK-16387 > URL: https://issues.apache.org/jira/browse/SPARK-16387 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Lev > > Here is a code (imports are omitted) > object Main extends App { > val sqlSession = SparkSession.builder().config(new SparkConf(). > setAppName("Sql Test").set("spark.app.id", "SQLTest"). > set("spark.master", "local[2]"). > set("spark.ui.enabled", "false") > .setJars(Seq("/mysql/mysql-connector-java-5.1.38.jar" )) > ).getOrCreate() > import sqlSession.implicits._ > val localprops = new Properties > localprops.put("user", "") > localprops.put("password", "") > val df = sqlSession.createDataset(Seq("a","b","c")).toDF("order") > val writer = df.write > .mode(SaveMode.Append) > writer > .jdbc("jdbc:mysql://localhost:3306/test3", s"jira_test", localprops) > } > End error is : > com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error > in your SQL syntax; check the manual that corresponds to your MySQL server > version for the right syntax to use near 'order TEXT )' at line 1 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > Clearly the reserved word has to be quoted -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16389) Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`
[ https://issues.apache.org/jira/browse/SPARK-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-16389. - Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14062 [https://github.com/apache/spark/pull/14062] > Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and > `SparkHiveDynamicPartitionWriterContainer` > - > > Key: SPARK-16389 > URL: https://issues.apache.org/jira/browse/SPARK-16389 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Minor > Fix For: 2.1.0 > > > - Remove useless `MetastoreRelation` from the signature of > `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. > - Avoid extra metadata retrieval using Hive client in `InsertIntoHiveTable`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.
[ https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-16340. - Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 2.1.0 > In regexp_replace function column and/or column expression should also > allowed as replacement. > -- > > Key: SPARK-16340 > URL: https://issues.apache.org/jira/browse/SPARK-16340 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.2 >Reporter: Mukul Garg >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.1.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, regexp_replace function only take string argument as replacement, > but in Hive it also accept any column or column expression. It also works in > Spark, but as a query. Need to create a overload of this function which also > accept Column as replacement. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.
[ https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363775#comment-15363775 ] Mukul Garg commented on SPARK-16340: I have checked the PR. This is what I have request. Thanks for taking this. :) > In regexp_replace function column and/or column expression should also > allowed as replacement. > -- > > Key: SPARK-16340 > URL: https://issues.apache.org/jira/browse/SPARK-16340 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.2 >Reporter: Mukul Garg >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, regexp_replace function only take string argument as replacement, > but in Hive it also accept any column or column expression. It also works in > Spark, but as a query. Need to create a overload of this function which also > accept Column as replacement. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16371) IS NOT NULL clause gives false for nested not empty column
[ https://issues.apache.org/jira/browse/SPARK-16371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363768#comment-15363768 ] Hyukjin Kwon commented on SPARK-16371: -- Sorry for being noisy, here is the Scala version {code} case class Parent(a: Child) case class Child(a: Long) spark.range(10).map(num => Parent(Child(num))).write.mode("overwrite").parquet("/tmp/test") spark.read.parquet("/tmp/test").where("a is not null").count() # 0 {code} It seems it fails to apply the filter from Parquet when both inner column name and outer column name are the same. I will look into this deeper. > IS NOT NULL clause gives false for nested not empty column > -- > > Key: SPARK-16371 > URL: https://issues.apache.org/jira/browse/SPARK-16371 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Maciej Bryński >Priority: Critical > > I have df where column1 is struct type and there is 1M rows. > (sample data from https://issues.apache.org/jira/browse/SPARK-16320) > {code} > df.where("column1 is not null").count() > {code} > gives: > 1M in Spark 1.6 > *0* in Spark 2.0 > Is there a change in IS NOT NULL behaviour in Spark 2.0 ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16371) IS NOT NULL clause gives false for nested not empty column
[ https://issues.apache.org/jira/browse/SPARK-16371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363757#comment-15363757 ] Hyukjin Kwon edited comment on SPARK-16371 at 7/6/16 4:34 AM: -- Here is shorter codes {code} from pyspark.sql.functions import struct child_df = spark.range(10) parent_df = child_df.select(struct("id").alias("id")) parent_df.write.mode('overwrite').parquet('/tmp/test') parent_df = spark.read.parquet('/tmp/test') parent_df.where("id is not null").count() # 0 parent_df.count() # 10 {code} was (Author: hyukjin.kwon): Here is shorter codes {code} from pyspark.sql.functions import struct from pyspark.sql import Row path = '/tmp/test' rdd = sc.parallelize(range(10)) data = rdd.map(lambda r: Row(column0=r)) child_df = spark.createDataFrame(data) parent_df = child_df.select(struct("column0").alias("column0")) parent_df.write.mode('overwrite').parquet(path) parent_df = spark.read.parquet(path) parent_df.where("column0 is not null").count() {code} > IS NOT NULL clause gives false for nested not empty column > -- > > Key: SPARK-16371 > URL: https://issues.apache.org/jira/browse/SPARK-16371 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Maciej Bryński >Priority: Critical > > I have df where column1 is struct type and there is 1M rows. > (sample data from https://issues.apache.org/jira/browse/SPARK-16320) > {code} > df.where("column1 is not null").count() > {code} > gives: > 1M in Spark 1.6 > *0* in Spark 2.0 > Is there a change in IS NOT NULL behaviour in Spark 2.0 ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16371) IS NOT NULL clause gives false for nested not empty column
[ https://issues.apache.org/jira/browse/SPARK-16371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363757#comment-15363757 ] Hyukjin Kwon commented on SPARK-16371: -- Here is shorter codes {code} from pyspark.sql.functions import struct from pyspark.sql import Row path = '/tmp/test' rdd = sc.parallelize(range(10)) data = rdd.map(lambda r: Row(column0=r)) child_df = spark.createDataFrame(data) parent_df = child_df.select(struct("column0").alias("column0")) parent_df.write.mode('overwrite').parquet(path) parent_df = spark.read.parquet(path) parent_df.where("column0 is not null").count() {code} > IS NOT NULL clause gives false for nested not empty column > -- > > Key: SPARK-16371 > URL: https://issues.apache.org/jira/browse/SPARK-16371 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Maciej Bryński >Priority: Critical > > I have df where column1 is struct type and there is 1M rows. > (sample data from https://issues.apache.org/jira/browse/SPARK-16320) > {code} > df.where("column1 is not null").count() > {code} > gives: > 1M in Spark 1.6 > *0* in Spark 2.0 > Is there a change in IS NOT NULL behaviour in Spark 2.0 ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16389) Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`
[ https://issues.apache.org/jira/browse/SPARK-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-16389: Assignee: Xiao Li > Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and > `SparkHiveDynamicPartitionWriterContainer` > - > > Key: SPARK-16389 > URL: https://issues.apache.org/jira/browse/SPARK-16389 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Minor > > - Remove useless `MetastoreRelation` from the signature of > `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. > - Avoid extra metadata retrieval using Hive client in `InsertIntoHiveTable`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15730) [Spark SQL] the value of 'hiveconf' parameter in Spark-sql CLI don't take effect in spark-sql session
[ https://issues.apache.org/jira/browse/SPARK-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363691#comment-15363691 ] Yi Zhou commented on SPARK-15730: - Thanks a lot [~chenghao] and [~yhuai] ! > [Spark SQL] the value of 'hiveconf' parameter in Spark-sql CLI don't take > effect in spark-sql session > - > > Key: SPARK-15730 > URL: https://issues.apache.org/jira/browse/SPARK-15730 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Yi Zhou >Assignee: Cheng Hao >Priority: Critical > Fix For: 2.0.0 > > > {noformat} > /usr/lib/spark/bin/spark-sql -v --driver-memory 4g --executor-memory 7g > --executor-cores 5 --num-executors 31 --master yarn-client --conf > spark.yarn.executor.memoryOverhead=1024 --hiveconf RESULT_TABLE=test_result01 > spark-sql> use test; > 16/06/02 21:36:15 INFO execution.SparkSqlParser: Parsing command: use test > 16/06/02 21:36:15 INFO spark.SparkContext: Starting job: processCmd at > CliDriver.java:376 > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Got job 2 (processCmd at > CliDriver.java:376) with 1 output partitions > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 > (processCmd at CliDriver.java:376) > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Parents of final stage: List() > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Missing parents: List() > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Submitting ResultStage 2 > (MapPartitionsRDD[8] at processCmd at CliDriver.java:376), which has no > missing parents > 16/06/02 21:36:15 INFO memory.MemoryStore: Block broadcast_2 stored as values > in memory (estimated size 3.2 KB, free 2.4 GB) > 16/06/02 21:36:15 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as > bytes in memory (estimated size 1964.0 B, free 2.4 GB) > 16/06/02 21:36:15 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in > memory on 192.168.3.11:36189 (size: 1964.0 B, free: 2.4 GB) > 16/06/02 21:36:15 INFO spark.SparkContext: Created broadcast 2 from broadcast > at DAGScheduler.scala:1012 > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Submitting 1 missing tasks > from ResultStage 2 (MapPartitionsRDD[8] at processCmd at CliDriver.java:376) > 16/06/02 21:36:15 INFO cluster.YarnScheduler: Adding task set 2.0 with 1 tasks > 16/06/02 21:36:15 INFO scheduler.TaskSetManager: Starting task 0.0 in stage > 2.0 (TID 2, 192.168.3.13, partition 0, PROCESS_LOCAL, 5362 bytes) > 16/06/02 21:36:15 INFO cluster.YarnClientSchedulerBackend: Launching task 2 > on executor id: 10 hostname: 192.168.3.13. > 16/06/02 21:36:16 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in > memory on hw-node3:45924 (size: 1964.0 B, free: 4.4 GB) > 16/06/02 21:36:17 INFO scheduler.TaskSetManager: Finished task 0.0 in stage > 2.0 (TID 2) in 1934 ms on 192.168.3.13 (1/1) > 16/06/02 21:36:17 INFO cluster.YarnScheduler: Removed TaskSet 2.0, whose > tasks have all completed, from pool > 16/06/02 21:36:17 INFO scheduler.DAGScheduler: ResultStage 2 (processCmd at > CliDriver.java:376) finished in 1.937 s > 16/06/02 21:36:17 INFO scheduler.DAGScheduler: Job 2 finished: processCmd at > CliDriver.java:376, took 1.962631 s > Time taken: 2.027 seconds > 16/06/02 21:36:17 INFO CliDriver: Time taken: 2.027 seconds > spark-sql> DROP TABLE IF EXISTS ${hiveconf:RESULT_TABLE}; > 16/06/02 21:36:36 INFO execution.SparkSqlParser: Parsing command: DROP TABLE > IF EXISTS ${hiveconf:RESULT_TABLE} > Error in query: > mismatched input '$' expecting {'ADD', 'AS', 'ALL', 'GROUP', 'BY', > 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'LIMIT', 'AT', 'IN', 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'OUTER', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', > 'RANGE', 'ROWS', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', > 'VALUES', 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', > 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'TO', > 'TABLESAMPLE', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', > 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'IF', > 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', > 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', > 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', > 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'EXTENDED', > 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, > 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', > 'STORED', 'DIRECTORIES', 'LOCATION',
[jira] [Commented] (SPARK-16387) Reserved SQL words are not escaped by JDBC writer
[ https://issues.apache.org/jira/browse/SPARK-16387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363682#comment-15363682 ] Dongjoon Hyun commented on SPARK-16387: --- Hi, `escaping` sounds possible, but it is not an easy issue to implement that portably for all DB. We need to support various MySQL, PostgreSQL, MSSQL, and so on. The standard is double quote ("), but even MySQL does not support that naturally. (Only supported in a ANSI mode?). MySQL uses backtick (`), but PostgreSQL does not (if I remember correctly.) MSSQL uses '[]'. I want to help you with this issue, but I've no idea. Do you have any idea for this? > Reserved SQL words are not escaped by JDBC writer > - > > Key: SPARK-16387 > URL: https://issues.apache.org/jira/browse/SPARK-16387 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Lev > > Here is a code (imports are omitted) > object Main extends App { > val sqlSession = SparkSession.builder().config(new SparkConf(). > setAppName("Sql Test").set("spark.app.id", "SQLTest"). > set("spark.master", "local[2]"). > set("spark.ui.enabled", "false") > .setJars(Seq("/mysql/mysql-connector-java-5.1.38.jar" )) > ).getOrCreate() > import sqlSession.implicits._ > val localprops = new Properties > localprops.put("user", "") > localprops.put("password", "") > val df = sqlSession.createDataset(Seq("a","b","c")).toDF("order") > val writer = df.write > .mode(SaveMode.Append) > writer > .jdbc("jdbc:mysql://localhost:3306/test3", s"jira_test", localprops) > } > End error is : > com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error > in your SQL syntax; check the manual that corresponds to your MySQL server > version for the right syntax to use near 'order TEXT )' at line 1 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > Clearly the reserved word has to be quoted -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16286) Implement stack table generating function
[ https://issues.apache.org/jira/browse/SPARK-16286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-16286: Assignee: Dongjoon Hyun > Implement stack table generating function > - > > Key: SPARK-16286 > URL: https://issues.apache.org/jira/browse/SPARK-16286 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Dongjoon Hyun > Fix For: 2.1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16286) Implement stack table generating function
[ https://issues.apache.org/jira/browse/SPARK-16286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-16286. - Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14033 [https://github.com/apache/spark/pull/14033] > Implement stack table generating function > - > > Key: SPARK-16286 > URL: https://issues.apache.org/jira/browse/SPARK-16286 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Dongjoon Hyun > Fix For: 2.1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16389) Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`
[ https://issues.apache.org/jira/browse/SPARK-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16389: Assignee: (was: Apache Spark) > Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and > `SparkHiveDynamicPartitionWriterContainer` > - > > Key: SPARK-16389 > URL: https://issues.apache.org/jira/browse/SPARK-16389 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Priority: Minor > > - Remove useless `MetastoreRelation` from the signature of > `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. > - Avoid extra metadata retrieval using Hive client in `InsertIntoHiveTable`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16389) Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`
[ https://issues.apache.org/jira/browse/SPARK-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363594#comment-15363594 ] Apache Spark commented on SPARK-16389: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/14062 > Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and > `SparkHiveDynamicPartitionWriterContainer` > - > > Key: SPARK-16389 > URL: https://issues.apache.org/jira/browse/SPARK-16389 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Priority: Minor > > - Remove useless `MetastoreRelation` from the signature of > `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. > - Avoid extra metadata retrieval using Hive client in `InsertIntoHiveTable`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16389) Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`
[ https://issues.apache.org/jira/browse/SPARK-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16389: Assignee: Apache Spark > Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and > `SparkHiveDynamicPartitionWriterContainer` > - > > Key: SPARK-16389 > URL: https://issues.apache.org/jira/browse/SPARK-16389 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Apache Spark >Priority: Minor > > - Remove useless `MetastoreRelation` from the signature of > `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. > - Avoid extra metadata retrieval using Hive client in `InsertIntoHiveTable`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15761) pyspark shell should load if PYSPARK_DRIVER_PYTHON is ipython an Python3
[ https://issues.apache.org/jira/browse/SPARK-15761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-15761: Fix Version/s: (was: 2.0.1) 2.0.0 > pyspark shell should load if PYSPARK_DRIVER_PYTHON is ipython an Python3 > > > Key: SPARK-15761 > URL: https://issues.apache.org/jira/browse/SPARK-15761 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Manoj Kumar >Assignee: Manoj Kumar >Priority: Minor > Fix For: 1.6.3, 2.0.0 > > > My default python is ipython3 and it is odd that it fails with "IPython > requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16182) Utils.scala -- terminateProcess() should call Process.destroyForcibly() if and only if Process.destroy() fails
[ https://issues.apache.org/jira/browse/SPARK-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-16182: Fix Version/s: (was: 2.0.1) 2.0.0 > Utils.scala -- terminateProcess() should call Process.destroyForcibly() if > and only if Process.destroy() fails > -- > > Key: SPARK-16182 > URL: https://issues.apache.org/jira/browse/SPARK-16182 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: OSX El Capitan (java "1.8.0_65"), Oracle Linux 6 (java > 1.8.0_92-b14) >Reporter: Christian Chua >Assignee: Sean Owen >Priority: Critical > Fix For: 1.6.3, 2.0.0 > > > Spark streaming documentation recommends application developers create static > connection pools. To clean up this pool, we add a shutdown hook. > The problem is that in spark 1.6.1, the shutdown hook for an executor will be > called only for the first submitted job. (on the second and subsequent job > submissions, the shutdown hook for the executor will NOT be invoked) > problem not seen when using java 1.7 > problem not seen when using spark 1.6.0 > looks like this bug is caused by this modification from 1.6.0 to 1.6.1: > https://issues.apache.org/jira/browse/SPARK-12486 > steps to reproduce the problem : > 1.) install spark 1.6.1 > 2.) submit this basic spark application > import org.apache.spark.{ SparkContext, SparkConf } > object MyPool { > def printToFile( f : java.io.File )( op : java.io.PrintWriter => Unit ) { > val p = new java.io.PrintWriter(f) > try { > op(p) > } > finally { > p.close() > } > } > def myfunc( ) = { > "a" > } > def createEvidence( ) = { > printToFile(new java.io.File("/var/tmp/evidence.txt")) { p => > p.println("the evidence") > } > } > sys.addShutdownHook { > createEvidence() > } > } > object BasicSpark { > def main( args : Array[String] ) = { > val sparkConf = new SparkConf().setAppName("BasicPi") > val sc = new SparkContext(sparkConf) > sc.parallelize(1 to 2).foreach { i => println("f : " + > MyPool.myfunc()) > } > sc.stop() > } > } > 3.) you will see that /var/tmp/evidence.txt is created > 4.) now delete this file > 5.) submit a second job > 6.) you will see that /var/tmp/evidence.txt is no longer created on the > second submission > 7.) if you use java 7 or spark 1.6.0, the evidence file will be created on > the second and subsequent submits -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16389) Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`
Xiao Li created SPARK-16389: --- Summary: Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer` Key: SPARK-16389 URL: https://issues.apache.org/jira/browse/SPARK-16389 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.0.0 Reporter: Xiao Li Priority: Minor - Remove useless `MetastoreRelation` from the signature of `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. - Avoid extra metadata retrieval using Hive client in `InsertIntoHiveTable`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16353) Intended javadoc options are not honored for Java unidoc
[ https://issues.apache.org/jira/browse/SPARK-16353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-16353: Fix Version/s: (was: 2.0.1) 2.0.0 > Intended javadoc options are not honored for Java unidoc > > > Key: SPARK-16353 > URL: https://issues.apache.org/jira/browse/SPARK-16353 > Project: Spark > Issue Type: Bug > Components: Build, Documentation >Affects Versions: 1.6.2, 2.0.0, 2.0.1, 2.1.0 >Reporter: Michael Allman >Assignee: Michael Allman >Priority: Minor > Fix For: 1.6.3, 2.0.0 > > > {{project/SparkBuild.scala}} specifies > {noformat} > javacOptions in doc := Seq( > "-windowtitle", "Spark " + version.value.replaceAll("-SNAPSHOT", "") + > "JavaDoc", > "-public", > "-noqualifier", "java.lang" > ) > {noformat} > However, {{sbt javaunidoc:doc}} ignores these options. To wit, the title of > http://spark.apache.org/docs/latest/api/java/index.html is {{Generated > Documentation (Untitled)}}, not {{Spark 1.6.2 JavaDoc}} as it should be. > (N.B. the Spark 1.6.2 build defines several javadoc groups as well, which are > also missing from the official docs.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16382) YARN - Dynamic allocation with spark.executor.instances should increase max executors.
[ https://issues.apache.org/jira/browse/SPARK-16382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363592#comment-15363592 ] Saisai Shao commented on SPARK-16382: - I would suggest to fail and complain. Max usually specifies the upper bound of resources can be used for Spark, usually it should not be exceeded. Also in the YarnSparkHadoopUtil.scala, we have such constraint: {code} require(initialNumExecutors >= minNumExecutors && initialNumExecutors <= maxNumExecutors, s"initial executor number $initialNumExecutors must between min executor number " + s"$minNumExecutors and max executor number $maxNumExecutors") {code} > YARN - Dynamic allocation with spark.executor.instances should increase max > executors. > -- > > Key: SPARK-16382 > URL: https://issues.apache.org/jira/browse/SPARK-16382 > Project: Spark > Issue Type: Bug > Components: YARN >Reporter: Ryan Blue > > SPARK-13723 changed the behavior of dynamic allocation when > {{--num-executors}} ({{spark.executor.instances}}) is set. Rather than > turning off dynamic allocation, the value is used as the initial number of > executors. This did not change the behavior of > {{spark.dynamicAllocation.maxExecutors}}. We've noticed that some users set > {{--num-executors}} higher than the max and the expectation is that the max > increases. > I think that either max should be increased, or Spark should fail and > complain that the executors requested is higher than the max. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16388) Remove spark.sql.nativeView and spark.sql.nativeView.canonical config
[ https://issues.apache.org/jira/browse/SPARK-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16388: Assignee: Apache Spark (was: Reynold Xin) > Remove spark.sql.nativeView and spark.sql.nativeView.canonical config > - > > Key: SPARK-16388 > URL: https://issues.apache.org/jira/browse/SPARK-16388 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > > These two configs should not be relevant anymore after Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16388) Remove spark.sql.nativeView and spark.sql.nativeView.canonical config
[ https://issues.apache.org/jira/browse/SPARK-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16388: Assignee: Reynold Xin (was: Apache Spark) > Remove spark.sql.nativeView and spark.sql.nativeView.canonical config > - > > Key: SPARK-16388 > URL: https://issues.apache.org/jira/browse/SPARK-16388 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > These two configs should not be relevant anymore after Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16388) Remove spark.sql.nativeView and spark.sql.nativeView.canonical config
[ https://issues.apache.org/jira/browse/SPARK-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363585#comment-15363585 ] Apache Spark commented on SPARK-16388: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/14061 > Remove spark.sql.nativeView and spark.sql.nativeView.canonical config > - > > Key: SPARK-16388 > URL: https://issues.apache.org/jira/browse/SPARK-16388 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > These two configs should not be relevant anymore after Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16388) Remove spark.sql.nativeView and spark.sql.nativeView.canonical config
Reynold Xin created SPARK-16388: --- Summary: Remove spark.sql.nativeView and spark.sql.nativeView.canonical config Key: SPARK-16388 URL: https://issues.apache.org/jira/browse/SPARK-16388 Project: Spark Issue Type: Bug Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin These two configs should not be relevant anymore after Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16371) IS NOT NULL clause gives false for nested not empty column
[ https://issues.apache.org/jira/browse/SPARK-16371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363548#comment-15363548 ] Hyukjin Kwon commented on SPARK-16371: -- [~maver1ck] [~proflin] I could reproduce this. I will try to narrow it down. > IS NOT NULL clause gives false for nested not empty column > -- > > Key: SPARK-16371 > URL: https://issues.apache.org/jira/browse/SPARK-16371 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Maciej Bryński >Priority: Critical > > I have df where column1 is struct type and there is 1M rows. > (sample data from https://issues.apache.org/jira/browse/SPARK-16320) > {code} > df.where("column1 is not null").count() > {code} > gives: > 1M in Spark 1.6 > *0* in Spark 2.0 > Is there a change in IS NOT NULL behaviour in Spark 2.0 ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16367) Wheelhouse Support for PySpark
[ https://issues.apache.org/jira/browse/SPARK-16367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363509#comment-15363509 ] Jeff Zhang commented on SPARK-16367: Preparing the wheelhouse seems time consuming to me especially when many packages are needed and themselves also have dependencies as well. If internet is accessible, I would rather ask the cluster to do that. [~gae...@xeberon.net] Do you know whether local repository is supported by python ? So that cluster administrator can create a private wheelhouse so that all the machines in the cluster can access that repository just like private maven repository. > Wheelhouse Support for PySpark > -- > > Key: SPARK-16367 > URL: https://issues.apache.org/jira/browse/SPARK-16367 > Project: Spark > Issue Type: New Feature > Components: Deploy, PySpark >Affects Versions: 1.6.1, 1.6.2, 2.0.0 >Reporter: Semet > Labels: newbie, python, python-wheel, wheelhouse > Original Estimate: 168h > Remaining Estimate: 168h > > *Rational* > Is it recommended, in order to deploying Scala packages written in Scala, to > build big fat jar files. This allows to have all dependencies on one package > so the only "cost" is copy time to deploy this file on every Spark Node. > On the other hand, Python deployment is more difficult once you want to use > external packages, and you don't really want to mess with the IT to deploy > the packages on the virtualenv of each nodes. > *Previous approaches* > I based the current proposal over the two following bugs related to this > point: > - SPARK-6764 ("Wheel support for PySpark") > - SPARK-13587("Support virtualenv in PySpark") > First part of my proposal was to merge, in order to support wheels install > and virtualenv creation > *Uber Fat Wheelhouse for Python Deployment* > In Python, the packaging standard is now "wheels", which goes further that > old good ".egg" files. With a wheel file (".whl"), the package is already > prepared for a given architecture. You can have several wheel, each specific > to an architecture, or environment. > The {{pip}} tools now how to select the package matching the current system, > how to install this package in a light speed. Said otherwise, package that > requires compilation of a C module, for instance, does *not* compile anything > when installing from wheel file. > {{pip}} also provides the ability to generate easily all wheel of all > packages used for a given module (inside a "virtualenv"). This is called > "wheelhouse". You can even don't mess with this compilation and retrieve it > directly from pypi.python.org. > *Developer workflow* > Here is, in a more concrete way, my proposal for on Pyspark developers point > of view: > - you are writing a PySpark script that increase in term of size and > dependencies. Deploying on Spark for example requires to build numpy or > Theano and other dependencies > - to use "Big Fat Wheelhouse" support of Pyspark, you need to turn his script > into a standard Python package: > -- write a {{requirements.txt}}. I recommend to specify all package version. > You can use [pip-tools|https://github.com/nvie/pip-tools] to maintain the > requirements.txt > {code} > astroid==1.4.6# via pylint > autopep8==1.2.4 > click==6.6# via pip-tools > colorama==0.3.7 # via pylint > enum34==1.1.6 # via hypothesis > findspark==1.0.0 # via spark-testing-base > first==2.0.1 # via pip-tools > hypothesis==3.4.0 # via spark-testing-base > lazy-object-proxy==1.2.2 # via astroid > linecache2==1.0.0 # via traceback2 > pbr==1.10.0 > pep8==1.7.0 # via autopep8 > pip-tools==1.6.5 > py==1.4.31# via pytest > pyflakes==1.2.3 > pylint==1.5.6 > pytest==2.9.2 # via spark-testing-base > six==1.10.0 # via astroid, pip-tools, pylint, unittest2 > spark-testing-base==0.0.7.post2 > traceback2==1.4.0 # via unittest2 > unittest2==1.1.0 # via spark-testing-base > wheel==0.29.0 > wrapt==1.10.8 # via astroid > {code} > -- write a setup.py with some entry points or package. Use > [PBR|http://docs.openstack.org/developer/pbr/] it makes the jobs of > maitaining a setup.py files really easy > -- create a virtualenv if not already in one: > {code} > virtualenv env > {code} > -- Work on your environment, define the requirement you need in > {{requirements.txt}}, do all the {{pip install}} you need. > - create the wheelhouse for your current project > {code} > pip install wheelhouse > pip wheel . --wheel-dir wheelhouse > {code} > This can take some times, but at the end you have all the .whl required *for > your current system* > - zip it into a {{wheelhouse.zip}}. > Note that you can have your own package (for instance
[jira] [Commented] (SPARK-16367) Wheelhouse Support for PySpark
[ https://issues.apache.org/jira/browse/SPARK-16367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363513#comment-15363513 ] Jeff Zhang commented on SPARK-16367: Oh, happen to find this project to build local python package repository. Maybe user can use that http://doc.devpi.net/latest/ > Wheelhouse Support for PySpark > -- > > Key: SPARK-16367 > URL: https://issues.apache.org/jira/browse/SPARK-16367 > Project: Spark > Issue Type: New Feature > Components: Deploy, PySpark >Affects Versions: 1.6.1, 1.6.2, 2.0.0 >Reporter: Semet > Labels: newbie, python, python-wheel, wheelhouse > Original Estimate: 168h > Remaining Estimate: 168h > > *Rational* > Is it recommended, in order to deploying Scala packages written in Scala, to > build big fat jar files. This allows to have all dependencies on one package > so the only "cost" is copy time to deploy this file on every Spark Node. > On the other hand, Python deployment is more difficult once you want to use > external packages, and you don't really want to mess with the IT to deploy > the packages on the virtualenv of each nodes. > *Previous approaches* > I based the current proposal over the two following bugs related to this > point: > - SPARK-6764 ("Wheel support for PySpark") > - SPARK-13587("Support virtualenv in PySpark") > First part of my proposal was to merge, in order to support wheels install > and virtualenv creation > *Uber Fat Wheelhouse for Python Deployment* > In Python, the packaging standard is now "wheels", which goes further that > old good ".egg" files. With a wheel file (".whl"), the package is already > prepared for a given architecture. You can have several wheel, each specific > to an architecture, or environment. > The {{pip}} tools now how to select the package matching the current system, > how to install this package in a light speed. Said otherwise, package that > requires compilation of a C module, for instance, does *not* compile anything > when installing from wheel file. > {{pip}} also provides the ability to generate easily all wheel of all > packages used for a given module (inside a "virtualenv"). This is called > "wheelhouse". You can even don't mess with this compilation and retrieve it > directly from pypi.python.org. > *Developer workflow* > Here is, in a more concrete way, my proposal for on Pyspark developers point > of view: > - you are writing a PySpark script that increase in term of size and > dependencies. Deploying on Spark for example requires to build numpy or > Theano and other dependencies > - to use "Big Fat Wheelhouse" support of Pyspark, you need to turn his script > into a standard Python package: > -- write a {{requirements.txt}}. I recommend to specify all package version. > You can use [pip-tools|https://github.com/nvie/pip-tools] to maintain the > requirements.txt > {code} > astroid==1.4.6# via pylint > autopep8==1.2.4 > click==6.6# via pip-tools > colorama==0.3.7 # via pylint > enum34==1.1.6 # via hypothesis > findspark==1.0.0 # via spark-testing-base > first==2.0.1 # via pip-tools > hypothesis==3.4.0 # via spark-testing-base > lazy-object-proxy==1.2.2 # via astroid > linecache2==1.0.0 # via traceback2 > pbr==1.10.0 > pep8==1.7.0 # via autopep8 > pip-tools==1.6.5 > py==1.4.31# via pytest > pyflakes==1.2.3 > pylint==1.5.6 > pytest==2.9.2 # via spark-testing-base > six==1.10.0 # via astroid, pip-tools, pylint, unittest2 > spark-testing-base==0.0.7.post2 > traceback2==1.4.0 # via unittest2 > unittest2==1.1.0 # via spark-testing-base > wheel==0.29.0 > wrapt==1.10.8 # via astroid > {code} > -- write a setup.py with some entry points or package. Use > [PBR|http://docs.openstack.org/developer/pbr/] it makes the jobs of > maitaining a setup.py files really easy > -- create a virtualenv if not already in one: > {code} > virtualenv env > {code} > -- Work on your environment, define the requirement you need in > {{requirements.txt}}, do all the {{pip install}} you need. > - create the wheelhouse for your current project > {code} > pip install wheelhouse > pip wheel . --wheel-dir wheelhouse > {code} > This can take some times, but at the end you have all the .whl required *for > your current system* > - zip it into a {{wheelhouse.zip}}. > Note that you can have your own package (for instance 'my_package') be > generated into a wheel and so installed by {{pip}} automatically. > Now comes the time to submit the project: > {code} > bin/spark-submit --master master --deploy-mode client --files > /path/to/virtualenv/requirements.txt,/path/to/virtualenv/wheelhouse.zip > --conf "spark.pyspark.virtualenv.enabled=true"
[jira] [Resolved] (SPARK-16348) pyspark.ml MLSerDe should be called using full classpath
[ https://issues.apache.org/jira/browse/SPARK-16348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-16348. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 14023 [https://github.com/apache/spark/pull/14023] > pyspark.ml MLSerDe should be called using full classpath > > > Key: SPARK-16348 > URL: https://issues.apache.org/jira/browse/SPARK-16348 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Critical > Fix For: 2.0.0 > > > Depending on how Spark is set up, pyspark.ml may or may not be able to find > the MLSerDe instance when referenced as {{sc._jvm.MLSerDe}}. This can cause > failures {{'JavaPackage' object is not callable}} when trying to access > Vector or Matrix values from pyspark, such as retrieving the coefficients of > a LinearRegressionModel. > Proposal: Whenever we reference a class in the _jvm from pyspark, we should > use the full classpath: {{sc._jvm.org.apache.spark.ml.python.MLSerDe}}. This > fixes the bug in my case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16385) NoSuchMethodException thrown by Utils.waitForProcess
[ https://issues.apache.org/jira/browse/SPARK-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-16385. Resolution: Fixed Assignee: Marcelo Vanzin Fix Version/s: 2.0.0 > NoSuchMethodException thrown by Utils.waitForProcess > > > Key: SPARK-16385 > URL: https://issues.apache.org/jira/browse/SPARK-16385 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 2.0.0 > > > The code in Utils.waitForProcess catches the wrong exception: when using > reflection, {{NoSuchMethodException}} is thrown, but the code catches > {{NoSuchMethodError}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16383) Remove `SessionState.executeSql`
[ https://issues.apache.org/jira/browse/SPARK-16383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-16383. - Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 2.1.0 > Remove `SessionState.executeSql` > > > Key: SPARK-16383 > URL: https://issues.apache.org/jira/browse/SPARK-16383 > Project: Spark > Issue Type: Task > Components: SQL >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.1.0 > > > This issue removes `SessionState.executeSql` in favor of `SparkSession.sql`. > We can remove this safely since the visibility `SessionState` is > `private[sql]` and `executeSql` is only used in one **ignored** test, > `test("Multiple Hive Instances")`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16359) unidoc workaround for multiple kafka versions
[ https://issues.apache.org/jira/browse/SPARK-16359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-16359. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 14041 [https://github.com/apache/spark/pull/14041] > unidoc workaround for multiple kafka versions > - > > Key: SPARK-16359 > URL: https://issues.apache.org/jira/browse/SPARK-16359 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Cody Koeninger > Fix For: 2.0.0 > > > sbt unidoc plugin uses dependencyClasspath.all > Because of this, having both kafka 0.8 and 0.10 dependencies on the classpath > causes compilation errors during unidoc. > Need a workaround, possibly to skip 0.10 during unidoc and then try to add it > back later. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15730) [Spark SQL] the value of 'hiveconf' parameter in Spark-sql CLI don't take effect in spark-sql session
[ https://issues.apache.org/jira/browse/SPARK-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-15730. - Resolution: Fixed Assignee: Cheng Hao Fix Version/s: 2.0.0 > [Spark SQL] the value of 'hiveconf' parameter in Spark-sql CLI don't take > effect in spark-sql session > - > > Key: SPARK-15730 > URL: https://issues.apache.org/jira/browse/SPARK-15730 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Yi Zhou >Assignee: Cheng Hao >Priority: Critical > Fix For: 2.0.0 > > > {noformat} > /usr/lib/spark/bin/spark-sql -v --driver-memory 4g --executor-memory 7g > --executor-cores 5 --num-executors 31 --master yarn-client --conf > spark.yarn.executor.memoryOverhead=1024 --hiveconf RESULT_TABLE=test_result01 > spark-sql> use test; > 16/06/02 21:36:15 INFO execution.SparkSqlParser: Parsing command: use test > 16/06/02 21:36:15 INFO spark.SparkContext: Starting job: processCmd at > CliDriver.java:376 > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Got job 2 (processCmd at > CliDriver.java:376) with 1 output partitions > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 > (processCmd at CliDriver.java:376) > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Parents of final stage: List() > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Missing parents: List() > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Submitting ResultStage 2 > (MapPartitionsRDD[8] at processCmd at CliDriver.java:376), which has no > missing parents > 16/06/02 21:36:15 INFO memory.MemoryStore: Block broadcast_2 stored as values > in memory (estimated size 3.2 KB, free 2.4 GB) > 16/06/02 21:36:15 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as > bytes in memory (estimated size 1964.0 B, free 2.4 GB) > 16/06/02 21:36:15 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in > memory on 192.168.3.11:36189 (size: 1964.0 B, free: 2.4 GB) > 16/06/02 21:36:15 INFO spark.SparkContext: Created broadcast 2 from broadcast > at DAGScheduler.scala:1012 > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Submitting 1 missing tasks > from ResultStage 2 (MapPartitionsRDD[8] at processCmd at CliDriver.java:376) > 16/06/02 21:36:15 INFO cluster.YarnScheduler: Adding task set 2.0 with 1 tasks > 16/06/02 21:36:15 INFO scheduler.TaskSetManager: Starting task 0.0 in stage > 2.0 (TID 2, 192.168.3.13, partition 0, PROCESS_LOCAL, 5362 bytes) > 16/06/02 21:36:15 INFO cluster.YarnClientSchedulerBackend: Launching task 2 > on executor id: 10 hostname: 192.168.3.13. > 16/06/02 21:36:16 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in > memory on hw-node3:45924 (size: 1964.0 B, free: 4.4 GB) > 16/06/02 21:36:17 INFO scheduler.TaskSetManager: Finished task 0.0 in stage > 2.0 (TID 2) in 1934 ms on 192.168.3.13 (1/1) > 16/06/02 21:36:17 INFO cluster.YarnScheduler: Removed TaskSet 2.0, whose > tasks have all completed, from pool > 16/06/02 21:36:17 INFO scheduler.DAGScheduler: ResultStage 2 (processCmd at > CliDriver.java:376) finished in 1.937 s > 16/06/02 21:36:17 INFO scheduler.DAGScheduler: Job 2 finished: processCmd at > CliDriver.java:376, took 1.962631 s > Time taken: 2.027 seconds > 16/06/02 21:36:17 INFO CliDriver: Time taken: 2.027 seconds > spark-sql> DROP TABLE IF EXISTS ${hiveconf:RESULT_TABLE}; > 16/06/02 21:36:36 INFO execution.SparkSqlParser: Parsing command: DROP TABLE > IF EXISTS ${hiveconf:RESULT_TABLE} > Error in query: > mismatched input '$' expecting {'ADD', 'AS', 'ALL', 'GROUP', 'BY', > 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'LIMIT', 'AT', 'IN', 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'OUTER', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', > 'RANGE', 'ROWS', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', > 'VALUES', 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', > 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'TO', > 'TABLESAMPLE', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', > 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'IF', > 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', > 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', > 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', > 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'EXTENDED', > 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, > 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', > 'STORED', 'DIRECTORIES',
[jira] [Commented] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.
[ https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363482#comment-15363482 ] Dongjoon Hyun commented on SPARK-16340: --- Please check if the PR is what you want. :) BTW, it will be 2.1.0 in case of merging. > In regexp_replace function column and/or column expression should also > allowed as replacement. > -- > > Key: SPARK-16340 > URL: https://issues.apache.org/jira/browse/SPARK-16340 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.2 >Reporter: Mukul Garg >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, regexp_replace function only take string argument as replacement, > but in Hive it also accept any column or column expression. It also works in > Spark, but as a query. Need to create a overload of this function which also > accept Column as replacement. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.
[ https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363478#comment-15363478 ] Apache Spark commented on SPARK-16340: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/14060 > In regexp_replace function column and/or column expression should also > allowed as replacement. > -- > > Key: SPARK-16340 > URL: https://issues.apache.org/jira/browse/SPARK-16340 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.2 >Reporter: Mukul Garg >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, regexp_replace function only take string argument as replacement, > but in Hive it also accept any column or column expression. It also works in > Spark, but as a query. Need to create a overload of this function which also > accept Column as replacement. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.
[ https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16340: Assignee: (was: Apache Spark) > In regexp_replace function column and/or column expression should also > allowed as replacement. > -- > > Key: SPARK-16340 > URL: https://issues.apache.org/jira/browse/SPARK-16340 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.2 >Reporter: Mukul Garg >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, regexp_replace function only take string argument as replacement, > but in Hive it also accept any column or column expression. It also works in > Spark, but as a query. Need to create a overload of this function which also > accept Column as replacement. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.
[ https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16340: Assignee: Apache Spark > In regexp_replace function column and/or column expression should also > allowed as replacement. > -- > > Key: SPARK-16340 > URL: https://issues.apache.org/jira/browse/SPARK-16340 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.2 >Reporter: Mukul Garg >Assignee: Apache Spark >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, regexp_replace function only take string argument as replacement, > but in Hive it also accept any column or column expression. It also works in > Spark, but as a query. Need to create a overload of this function which also > accept Column as replacement. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.
[ https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363469#comment-15363469 ] Dongjoon Hyun commented on SPARK-16340: --- Hi, [~mukul.ga...@gmail.com]. I'll make a PR for this issue. > In regexp_replace function column and/or column expression should also > allowed as replacement. > -- > > Key: SPARK-16340 > URL: https://issues.apache.org/jira/browse/SPARK-16340 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.2 >Reporter: Mukul Garg >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, regexp_replace function only take string argument as replacement, > but in Hive it also accept any column or column expression. It also works in > Spark, but as a query. Need to create a overload of this function which also > accept Column as replacement. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16387) Reserved SQL words are not escaped by JDBC writer
[ https://issues.apache.org/jira/browse/SPARK-16387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lev updated SPARK-16387: Summary: Reserved SQL words are not escaped by JDBC writer (was: Reserved words are not escaped for JDBC writer) > Reserved SQL words are not escaped by JDBC writer > - > > Key: SPARK-16387 > URL: https://issues.apache.org/jira/browse/SPARK-16387 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Lev > > Here is a code (imports are omitted) > object Main extends App { > val sqlSession = SparkSession.builder().config(new SparkConf(). > setAppName("Sql Test").set("spark.app.id", "SQLTest"). > set("spark.master", "local[2]"). > set("spark.ui.enabled", "false") > .setJars(Seq("/mysql/mysql-connector-java-5.1.38.jar" )) > ).getOrCreate() > import sqlSession.implicits._ > val localprops = new Properties > localprops.put("user", "") > localprops.put("password", "") > val df = sqlSession.createDataset(Seq("a","b","c")).toDF("order") > val writer = df.write > .mode(SaveMode.Append) > writer > .jdbc("jdbc:mysql://localhost:3306/test3", s"jira_test", localprops) > } > End error is : > com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error > in your SQL syntax; check the manual that corresponds to your MySQL server > version for the right syntax to use near 'order TEXT )' at line 1 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > Clearly the reserved word has to be quoted -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16387) Reserved words are not escaped for JDBC writer
Lev created SPARK-16387: --- Summary: Reserved words are not escaped for JDBC writer Key: SPARK-16387 URL: https://issues.apache.org/jira/browse/SPARK-16387 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Lev Here is a code (imports are omitted) object Main extends App { val sqlSession = SparkSession.builder().config(new SparkConf(). setAppName("Sql Test").set("spark.app.id", "SQLTest"). set("spark.master", "local[2]"). set("spark.ui.enabled", "false") .setJars(Seq("/mysql/mysql-connector-java-5.1.38.jar" )) ).getOrCreate() import sqlSession.implicits._ val localprops = new Properties localprops.put("user", "") localprops.put("password", "") val df = sqlSession.createDataset(Seq("a","b","c")).toDF("order") val writer = df.write .mode(SaveMode.Append) writer .jdbc("jdbc:mysql://localhost:3306/test3", s"jira_test", localprops) } End error is : com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'order TEXT )' at line 1 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) Clearly the reserved word has to be quoted -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16377) Spark MLlib: MultilayerPerceptronClassifier - error while training
[ https://issues.apache.org/jira/browse/SPARK-16377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363458#comment-15363458 ] Mikhail Shiryaev commented on SPARK-16377: -- The original issue with ArrayIndexOutOfBoundsException was due to bug in my code (inconsistency between layers and real features count). And issue with "ERROR StrongWolfeLineSearch" isn't reproducible yet. Sorry for taking your time and thank you for quick responses. > Spark MLlib: MultilayerPerceptronClassifier - error while training > -- > > Key: SPARK-16377 > URL: https://issues.apache.org/jira/browse/SPARK-16377 > Project: Spark > Issue Type: Bug > Components: ML, MLilb >Affects Versions: 1.5.2 >Reporter: Mikhail Shiryaev > > Hi, > I am trying to train model by MultilayerPerceptronClassifier. > It works on sample data from > data/mllib/sample_multiclass_classification_data.txt with 4 features, 3 > classes and layers [4, 4, 3]. > But when I try to use other input files with other features and classes (from > here for example: > https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html) > then I get errors. > Example: > Input file aloi (128 features, 1000 classes, layers [128, 128, 1000]): > with block size = 1: > ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. > Decreasing step size to Infinity > ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: > Line search failed > ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is > just poorly behaved? > with default block size = 128: > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at > org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:629) > > at > org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:628) > >at scala.collection.immutable.List.foreach(List.scala:381) >at > org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:628) > >at > org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:624) > > Even if I modify sample_multiclass_classification_data.txt file (rename all > 4-th features to 5-th) and run with layers [5, 5, 3] then I also get the same > errors as for file above. > So to resume: > I can't run training with default block size and with more than 4 features. > If I set block size to 1 then some actions are happened but I get errors > from LBFGS. > It is reproducible with Spark 1.5.2 and from master branch on github (from > 4-th July). > Did somebody already met with such behavior? > Is there bug in MultilayerPerceptronClassifier or I use it incorrectly? > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16240) model loading backward compatibility for ml.clustering.LDA
[ https://issues.apache.org/jira/browse/SPARK-16240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363454#comment-15363454 ] Joseph K. Bradley commented on SPARK-16240: --- +1 for adding special logic in the loading code. That's the general plan for handling backwards compatibility when the model storage format changes. Thanks! > model loading backward compatibility for ml.clustering.LDA > -- > > Key: SPARK-16240 > URL: https://issues.apache.org/jira/browse/SPARK-16240 > Project: Spark > Issue Type: Bug >Reporter: yuhao yang >Priority: Minor > > After resolving the matrix conversion issue, LDA model still cannot load 1.6 > models as one of the parameter name is changed. > https://github.com/apache/spark/pull/12065 > We can perhaps add some special logic in the loading code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16334) [SQL] SQL query on parquet table java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/SPARK-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363438#comment-15363438 ] Vladimir Ivanov edited comment on SPARK-16334 at 7/5/16 11:13 PM: -- Hi, we discovered problem with the same stacktrace in Spark 2.0. In our case it's thrown during DataFrame.rdd call. Moreover it somehow depends on volume of data, because it is not thrown when we change filter criteria accordingly. We used SparkSQL to write these parquet files and didn't explicitly specify WriterVersion option so I believe whatever version is set by default was used. was (Author: vivanov): Hi, we discovered problem with the same stacktrace in Spark 2.0. In our case it's thrown during {code}DataFrame.rdd{code} call. Moreover it somehow depends on volume of data, because it is not thrown when we change filter criteria accordingly. We used SparkSQL to write these parquet files and didn't explicitly specify {code}WriterVersion{code} option so I believe whatever version is set by default was used. > [SQL] SQL query on parquet table java.lang.ArrayIndexOutOfBoundsException > - > > Key: SPARK-16334 > URL: https://issues.apache.org/jira/browse/SPARK-16334 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Egor Pahomov >Priority: Critical > Labels: sql > > Query: > {code} > select * from blabla where user_id = 415706251 > {code} > Error: > {code} > 16/06/30 14:07:27 WARN scheduler.TaskSetManager: Lost task 11.0 in stage 0.0 > (TID 3, hadoop6): java.lang.ArrayIndexOutOfBoundsException: 6934 > at > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.decodeToBinary(PlainValuesDictionary.java:119) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.decodeDictionaryIds(VectorizedColumnReader.java:273) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:170) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:230) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:36) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:780) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:780) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > Work on 1.6.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16334) [SQL] SQL query on parquet table java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/SPARK-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363438#comment-15363438 ] Vladimir Ivanov commented on SPARK-16334: - Hi, we discovered problem with the same stacktrace in Spark 2.0. In our case it's thrown during {code}DataFrame.rdd{code} call. Moreover it somehow depends on volume of data, because it is not thrown when we change filter criteria accordingly. We used SparkSQL to write these parquet files and didn't explicitly specify {code}WriterVersion{code} option so I believe whatever version is set by default was used. > [SQL] SQL query on parquet table java.lang.ArrayIndexOutOfBoundsException > - > > Key: SPARK-16334 > URL: https://issues.apache.org/jira/browse/SPARK-16334 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Egor Pahomov >Priority: Critical > Labels: sql > > Query: > {code} > select * from blabla where user_id = 415706251 > {code} > Error: > {code} > 16/06/30 14:07:27 WARN scheduler.TaskSetManager: Lost task 11.0 in stage 0.0 > (TID 3, hadoop6): java.lang.ArrayIndexOutOfBoundsException: 6934 > at > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.decodeToBinary(PlainValuesDictionary.java:119) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.decodeDictionaryIds(VectorizedColumnReader.java:273) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:170) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:230) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:36) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:780) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:780) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > Work on 1.6.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16384) FROM_UNIXTIME reports incorrect days
[ https://issues.apache.org/jira/browse/SPARK-16384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363420#comment-15363420 ] Dongjoon Hyun commented on SPARK-16384: --- It's the behavior of Java `SimpleDateFormat`. Hive also returns the same result. > FROM_UNIXTIME reports incorrect days > > > Key: SPARK-16384 > URL: https://issues.apache.org/jira/browse/SPARK-16384 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1, 1.6.2, 2.0.0 >Reporter: Frank Stratton > > Timestamps between 2015-12-27 and 2015-12-31 are reported in the incorrect > year (2016-12-*): > {quote} > results = sqlContext.sql("SELECT FROM_UNIXTIME(1451088000, '-MM-dd')") > # 2015-12-26 > print results.collect() > results = sqlContext.sql("SELECT FROM_UNIXTIME(1451174400, '-MM-dd')") > # 2015-12-27 > print results.collect() > results = sqlContext.sql("SELECT FROM_UNIXTIME(1451260800, '-MM-dd')") > # 2015-12-28 > print results.collect() > results = sqlContext.sql("SELECT FROM_UNIXTIME(1451347200, '-MM-dd')") > # 2015-12-29 > print results.collect() > {quote} > outputs: > {quote} > [Row(_c0=u'2015-12-26')] > [Row(_c0=u'2016-12-27')] > [Row(_c0=u'2016-12-28')] > [Row(_c0=u'2016-12-29')] > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16384) FROM_UNIXTIME reports incorrect days
[ https://issues.apache.org/jira/browse/SPARK-16384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363418#comment-15363418 ] Dongjoon Hyun commented on SPARK-16384: --- Hi, [~epanastasi] That is not a bug. You should use '' instead of ''. {code} scala> sql("SELECT FROM_UNIXTIME(1451260800, '-MM-dd')").collect res1: Array[org.apache.spark.sql.Row] = Array([2016-12-27]) scala> sql("SELECT FROM_UNIXTIME(1451260800, '-MM-dd')").collect res2: Array[org.apache.spark.sql.Row] = Array([2015-12-27]) {code} > FROM_UNIXTIME reports incorrect days > > > Key: SPARK-16384 > URL: https://issues.apache.org/jira/browse/SPARK-16384 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1, 1.6.2, 2.0.0 >Reporter: Frank Stratton > > Timestamps between 2015-12-27 and 2015-12-31 are reported in the incorrect > year (2016-12-*): > {quote} > results = sqlContext.sql("SELECT FROM_UNIXTIME(1451088000, '-MM-dd')") > # 2015-12-26 > print results.collect() > results = sqlContext.sql("SELECT FROM_UNIXTIME(1451174400, '-MM-dd')") > # 2015-12-27 > print results.collect() > results = sqlContext.sql("SELECT FROM_UNIXTIME(1451260800, '-MM-dd')") > # 2015-12-28 > print results.collect() > results = sqlContext.sql("SELECT FROM_UNIXTIME(1451347200, '-MM-dd')") > # 2015-12-29 > print results.collect() > {quote} > outputs: > {quote} > [Row(_c0=u'2015-12-26')] > [Row(_c0=u'2016-12-27')] > [Row(_c0=u'2016-12-28')] > [Row(_c0=u'2016-12-29')] > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363410#comment-15363410 ] Michael Gummelt commented on SPARK-11857: - Thanks! > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16386) SQLContext and HiveContext parse a query string differently
[ https://issues.apache.org/jira/browse/SPARK-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Ren updated SPARK-16386: Description: I just want to figure out why the two contexts behavior differently even on a simple query. In a netshell, I have a query in which there is a String containing single quote and casting to Array/Map. I have tried all the combination of diff type of sql context and query call api (sql, df.select, df.selectExpr). I can't find one rules all. Here is the code for reproducing the problem. {code} import org.apache.spark.sql.SQLContext import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkConf, SparkContext} object Test extends App { val sc = new SparkContext("local[2]", "test", new SparkConf) val hiveContext = new HiveContext(sc) val sqlContext = new SQLContext(sc) val context = hiveContext // val context = sqlContext import context.implicits._ val df = Seq((Seq(1, 2), 2)).toDF("a", "b") df.registerTempTable("tbl") df.printSchema() // case 1 context.sql("select cast(a as array) from tbl").show() // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize input near 'array' '<' 'string' in primitive type specification; line 1 pos 17 // SQLContext => OK // case 2 context.sql("select 'a\\'b'").show() // HiveContext => OK // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string literal) found // case 3 df.selectExpr("cast(a as array)").show() // OK with HiveContext and SQLContext // case 4 df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of input expected } {code} was: I just want to figure out why the two contexts behavior differently even on a simple query. In a netshell, I have a query in which there is a String containing single quote and casting to Array/Map. I have tried all the combination of diff type of sql context and query call api (sql, df.select, df.selectExpr). I can't find one rules all. Here is the code for reproducing the problem. {code: java} import org.apache.spark.sql.SQLContext import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkConf, SparkContext} object Test extends App { val sc = new SparkContext("local[2]", "test", new SparkConf) val hiveContext = new HiveContext(sc) val sqlContext = new SQLContext(sc) val context = hiveContext // val context = sqlContext import context.implicits._ val df = Seq((Seq(1, 2), 2)).toDF("a", "b") df.registerTempTable("tbl") df.printSchema() // case 1 context.sql("select cast(a as array) from tbl").show() // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize input near 'array' '<' 'string' in primitive type specification; line 1 pos 17 // SQLContext => OK // case 2 context.sql("select 'a\\'b'").show() // HiveContext => OK // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string literal) found // case 3 df.selectExpr("cast(a as array)").show() // OK with HiveContext and SQLContext // case 4 df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of input expected } {code} > SQLContext and HiveContext parse a query string differently > --- > > Key: SPARK-16386 > URL: https://issues.apache.org/jira/browse/SPARK-16386 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 1.6.1, 1.6.2 > Environment: scala 2.10, 2.11 >Reporter: Hao Ren > Labels: patch > > I just want to figure out why the two contexts behavior differently even on a > simple query. > In a netshell, I have a query in which there is a String containing single > quote and casting to Array/Map. > I have tried all the combination of diff type of sql context and query call > api (sql, df.select, df.selectExpr). > I can't find one rules all. > Here is the code for reproducing the problem. > {code} > import org.apache.spark.sql.SQLContext > import org.apache.spark.sql.hive.HiveContext > import org.apache.spark.{SparkConf, SparkContext} > object Test extends App { > val sc = new SparkContext("local[2]", "test", new SparkConf) > val hiveContext = new HiveContext(sc) > val sqlContext = new SQLContext(sc) > val context = hiveContext > // val context = sqlContext > import context.implicits._ > val df = Seq((Seq(1, 2), 2)).toDF("a", "b") > df.registerTempTable("tbl") > df.printSchema() > // case 1 > context.sql("select cast(a as array) from tbl").show() > // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize > input near 'array' '<' 'string' in primitive type specification; line 1 pos 17 > // SQLContext => OK > // case 2 > context.sql("select 'a\\'b'").show() > //
[jira] [Updated] (SPARK-16386) SQLContext and HiveContext parse a query string differently
[ https://issues.apache.org/jira/browse/SPARK-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Ren updated SPARK-16386: Description: I just want to figure out why the two contexts behavior differently even on a simple query. In a netshell, I have a query in which there is a String containing single quote and casting to Array/Map. I have tried all the combination of diff type of sql context and query call api (sql, df.select, df.selectExpr). I can't find one rules all. Here is the code for reproducing the problem. {code: java} import org.apache.spark.sql.SQLContext import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkConf, SparkContext} object Test extends App { val sc = new SparkContext("local[2]", "test", new SparkConf) val hiveContext = new HiveContext(sc) val sqlContext = new SQLContext(sc) val context = hiveContext // val context = sqlContext import context.implicits._ val df = Seq((Seq(1, 2), 2)).toDF("a", "b") df.registerTempTable("tbl") df.printSchema() // case 1 context.sql("select cast(a as array) from tbl").show() // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize input near 'array' '<' 'string' in primitive type specification; line 1 pos 17 // SQLContext => OK // case 2 context.sql("select 'a\\'b'").show() // HiveContext => OK // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string literal) found // case 3 df.selectExpr("cast(a as array)").show() // OK with HiveContext and SQLContext // case 4 df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of input expected } {code} was: I just want to figure out why the two contexts behavior differently even on a simple query. In a netshell, I have a query in which there is a String containing single quote and casting to Array/Map. I have tried all the combination of diff type of sql context and query call api (sql, df.select, df.selectExpr). I can't find one rules all. Here is the code for reproducing the problem. {code: javaj} import org.apache.spark.sql.SQLContext import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkConf, SparkContext} object Test extends App { val sc = new SparkContext("local[2]", "test", new SparkConf) val hiveContext = new HiveContext(sc) val sqlContext = new SQLContext(sc) val context = hiveContext // val context = sqlContext import context.implicits._ val df = Seq((Seq(1, 2), 2)).toDF("a", "b") df.registerTempTable("tbl") df.printSchema() // case 1 context.sql("select cast(a as array) from tbl").show() // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize input near 'array' '<' 'string' in primitive type specification; line 1 pos 17 // SQLContext => OK // case 2 context.sql("select 'a\\'b'").show() // HiveContext => OK // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string literal) found // case 3 df.selectExpr("cast(a as array)").show() // OK with HiveContext and SQLContext // case 4 df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of input expected } {code} > SQLContext and HiveContext parse a query string differently > --- > > Key: SPARK-16386 > URL: https://issues.apache.org/jira/browse/SPARK-16386 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 1.6.1, 1.6.2 > Environment: scala 2.10, 2.11 >Reporter: Hao Ren > Labels: patch > > I just want to figure out why the two contexts behavior differently even on a > simple query. > In a netshell, I have a query in which there is a String containing single > quote and casting to Array/Map. > I have tried all the combination of diff type of sql context and query call > api (sql, df.select, df.selectExpr). > I can't find one rules all. > Here is the code for reproducing the problem. > {code: java} > import org.apache.spark.sql.SQLContext > import org.apache.spark.sql.hive.HiveContext > import org.apache.spark.{SparkConf, SparkContext} > object Test extends App { > val sc = new SparkContext("local[2]", "test", new SparkConf) > val hiveContext = new HiveContext(sc) > val sqlContext = new SQLContext(sc) > val context = hiveContext > // val context = sqlContext > import context.implicits._ > val df = Seq((Seq(1, 2), 2)).toDF("a", "b") > df.registerTempTable("tbl") > df.printSchema() > // case 1 > context.sql("select cast(a as array) from tbl").show() > // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize > input near 'array' '<' 'string' in primitive type specification; line 1 pos 17 > // SQLContext => OK > // case 2 > context.sql("select 'a\\'b'").show() >
[jira] [Created] (SPARK-16386) SQLContext and HiveContext parse a query string differently
Hao Ren created SPARK-16386: --- Summary: SQLContext and HiveContext parse a query string differently Key: SPARK-16386 URL: https://issues.apache.org/jira/browse/SPARK-16386 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.2, 1.6.1, 1.6.0 Environment: scala 2.10, 2.11 Reporter: Hao Ren I just want to figure out why the two contexts behavior differently even on a simple query. In a netshell, I have a query in which there is a String containing single quote and casting to Array/Map. I have tried all the combination of diff type of sql context and query call api (sql, df.select, df.selectExpr). I can't find one rules all. Here is the code for reproducing the problem. {code: scala} import org.apache.spark.sql.SQLContext import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkConf, SparkContext} object Test extends App { val sc = new SparkContext("local[2]", "test", new SparkConf) val hiveContext = new HiveContext(sc) val sqlContext = new SQLContext(sc) val context = hiveContext // val context = sqlContext import context.implicits._ val df = Seq((Seq(1, 2), 2)).toDF("a", "b") df.registerTempTable("tbl") df.printSchema() // case 1 context.sql("select cast(a as array) from tbl").show() // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize input near 'array' '<' 'string' in primitive type specification; line 1 pos 17 // SQLContext => OK // case 2 context.sql("select 'a\\'b'").show() // HiveContext => OK // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string literal) found // case 3 df.selectExpr("cast(a as array)").show() // OK with HiveContext and SQLContext // case 4 df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of input expected } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16386) SQLContext and HiveContext parse a query string differently
[ https://issues.apache.org/jira/browse/SPARK-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Ren updated SPARK-16386: Description: I just want to figure out why the two contexts behavior differently even on a simple query. In a netshell, I have a query in which there is a String containing single quote and casting to Array/Map. I have tried all the combination of diff type of sql context and query call api (sql, df.select, df.selectExpr). I can't find one rules all. Here is the code for reproducing the problem. {code: javaj} import org.apache.spark.sql.SQLContext import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkConf, SparkContext} object Test extends App { val sc = new SparkContext("local[2]", "test", new SparkConf) val hiveContext = new HiveContext(sc) val sqlContext = new SQLContext(sc) val context = hiveContext // val context = sqlContext import context.implicits._ val df = Seq((Seq(1, 2), 2)).toDF("a", "b") df.registerTempTable("tbl") df.printSchema() // case 1 context.sql("select cast(a as array) from tbl").show() // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize input near 'array' '<' 'string' in primitive type specification; line 1 pos 17 // SQLContext => OK // case 2 context.sql("select 'a\\'b'").show() // HiveContext => OK // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string literal) found // case 3 df.selectExpr("cast(a as array)").show() // OK with HiveContext and SQLContext // case 4 df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of input expected } {code} was: I just want to figure out why the two contexts behavior differently even on a simple query. In a netshell, I have a query in which there is a String containing single quote and casting to Array/Map. I have tried all the combination of diff type of sql context and query call api (sql, df.select, df.selectExpr). I can't find one rules all. Here is the code for reproducing the problem. {code: scala} import org.apache.spark.sql.SQLContext import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkConf, SparkContext} object Test extends App { val sc = new SparkContext("local[2]", "test", new SparkConf) val hiveContext = new HiveContext(sc) val sqlContext = new SQLContext(sc) val context = hiveContext // val context = sqlContext import context.implicits._ val df = Seq((Seq(1, 2), 2)).toDF("a", "b") df.registerTempTable("tbl") df.printSchema() // case 1 context.sql("select cast(a as array) from tbl").show() // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize input near 'array' '<' 'string' in primitive type specification; line 1 pos 17 // SQLContext => OK // case 2 context.sql("select 'a\\'b'").show() // HiveContext => OK // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string literal) found // case 3 df.selectExpr("cast(a as array)").show() // OK with HiveContext and SQLContext // case 4 df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of input expected } {code} > SQLContext and HiveContext parse a query string differently > --- > > Key: SPARK-16386 > URL: https://issues.apache.org/jira/browse/SPARK-16386 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 1.6.1, 1.6.2 > Environment: scala 2.10, 2.11 >Reporter: Hao Ren > Labels: patch > > I just want to figure out why the two contexts behavior differently even on a > simple query. > In a netshell, I have a query in which there is a String containing single > quote and casting to Array/Map. > I have tried all the combination of diff type of sql context and query call > api (sql, df.select, df.selectExpr). > I can't find one rules all. > Here is the code for reproducing the problem. > {code: javaj} > import org.apache.spark.sql.SQLContext > import org.apache.spark.sql.hive.HiveContext > import org.apache.spark.{SparkConf, SparkContext} > object Test extends App { > val sc = new SparkContext("local[2]", "test", new SparkConf) > val hiveContext = new HiveContext(sc) > val sqlContext = new SQLContext(sc) > val context = hiveContext > // val context = sqlContext > import context.implicits._ > val df = Seq((Seq(1, 2), 2)).toDF("a", "b") > df.registerTempTable("tbl") > df.printSchema() > // case 1 > context.sql("select cast(a as array) from tbl").show() > // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize > input near 'array' '<' 'string' in primitive type specification; line 1 pos 17 > // SQLContext => OK > // case 2 > context.sql("select 'a\\'b'").show()
[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363404#comment-15363404 ] Adam McElwee commented on SPARK-11857: -- Nah, carry on. I'm not using spark at the new job, so I won't be able to pass along any new info. > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16324) regexp_extract should doc that it returns empty string when match fails
[ https://issues.apache.org/jira/browse/SPARK-16324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-16324: -- Issue Type: Improvement (was: Bug) Summary: regexp_extract should doc that it returns empty string when match fails (was: regexp_extract returns empty string when match fails) > regexp_extract should doc that it returns empty string when match fails > --- > > Key: SPARK-16324 > URL: https://issues.apache.org/jira/browse/SPARK-16324 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.0.0 >Reporter: Max Moroz >Priority: Minor > > The documentation for regexp_extract isn't clear about how it should behave > if the regex didn't match the row. However, the Java documentation it refers > for further detail suggests that the return value should be null if the group > wasn't matched at all, empty string is the group actually matched empty > string, and an exception raised if the entire regex didn't match. > This would be identical to how python's own re module behaves when a > MatchObject.group() is called. > However, in practice regexp_extract() returns empty string when the match > fails. This seems to be a bug; if it was intended as a feature, it should > have been documented as such - and it was probably not a good idea since it > can result in silent bugs. > {code} > import pyspark.sql.functions as F > df = spark.createDataFrame([['abc']], ['text']) > assert df.select(F.regexp_extract('text', r'(z)', 1)).first()[0] == '' > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15909) PySpark classpath uri incorrectly set
[ https://issues.apache.org/jira/browse/SPARK-15909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-15909: -- Priority: Minor (was: Major) This is another instance of something that isn't generally supported -- running different contexts in one process. That said, it'd be better to explicitly fail or else find some change that would make this particular issue go away. > PySpark classpath uri incorrectly set > - > > Key: SPARK-15909 > URL: https://issues.apache.org/jira/browse/SPARK-15909 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.1 >Reporter: Liam Fisk >Priority: Minor > > PySpark behaves differently if the SparkContext is created within the REPL > (vs initialised by the shell). > My conf/spark-env.sh file contains: > {code} > #!/bin/bash > export SPARK_LOCAL_IP=172.20.30.158 > export LIBPROCESS_IP=172.20.30.158 > export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so > {code} > And when running pyspark it will correctly initialize my SparkContext. > However, when I run: > {code} > from pyspark import SparkContext, SparkConf > sc.stop() > conf = ( > SparkConf() > .setMaster("mesos://zk://foo:2181/mesos") > .setAppName("Jupyter PySpark") > ) > sc = SparkContext(conf=conf) > {code} > my _spark.driver.uri_ and URL classpath will point to localhost (preventing > my mesos cluster from accessing the appropriate files) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15804) Manually added metadata not saving with parquet
[ https://issues.apache.org/jira/browse/SPARK-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-15804: -- Fix Version/s: (was: 2.0.0) > Manually added metadata not saving with parquet > --- > > Key: SPARK-15804 > URL: https://issues.apache.org/jira/browse/SPARK-15804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Charlie Evans >Assignee: kevin yu > > Adding metadata with col().as(_, metadata) then saving the resultant > dataframe does not save the metadata. No error is thrown. Only see the schema > contains the metadata before saving and does not contain the metadata after > saving and loading the dataframe. Was working fine with 1.6.1. > {code} > case class TestRow(a: String, b: Int) > val rows = TestRow("a", 0) :: TestRow("b", 1) :: TestRow("c", 2) :: Nil > val df = spark.createDataFrame(rows) > import org.apache.spark.sql.types.MetadataBuilder > val md = new MetadataBuilder().putString("key", "value").build() > val dfWithMeta = df.select(col("a"), col("b").as("b", md)) > println(dfWithMeta.schema.json) > dfWithMeta.write.parquet("dfWithMeta") > val dfWithMeta2 = spark.read.parquet("dfWithMeta") > println(dfWithMeta2.schema.json) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16098) Multiclass SVM Learning
[ https://issues.apache.org/jira/browse/SPARK-16098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-16098. --- Resolution: Won't Fix > Multiclass SVM Learning > --- > > Key: SPARK-16098 > URL: https://issues.apache.org/jira/browse/SPARK-16098 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Affects Versions: 2.0.0 > Environment: Spark MLLib and ML 1.6.1 >Reporter: Hayri Volkan Agun >Priority: Minor > Original Estimate: 1,512h > Remaining Estimate: 1,512h > > There exists a OneVsRest classifier for using all binary classification > classifiers in multi-class classification. However for Linear SVM using > OneVsRest may create an imbalanced dataset scenarios where SVM of Spark > certainly fails. I verified this by creating LinearSVM classifier and > implemented predictRaw method of ClassificationModel class. In all > experiments the results came very poor in terms of F-Measure. The only > explanation is SVM is very sensitive to imbalanced dataset, and naturally > OneVsRest classifier creates an imbalanced dataset. > For multi-class classification, linear SVM can be optimized by considering > imbalanced datasets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363367#comment-15363367 ] Michael Gummelt commented on SPARK-11857: - I'll give [~amcelwee] a couple days to respond. [~dragos] [~skonto] [~tnachen] speak now or forever hold your peace. > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15730) [Spark SQL] the value of 'hiveconf' parameter in Spark-sql CLI don't take effect in spark-sql session
[ https://issues.apache.org/jira/browse/SPARK-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363355#comment-15363355 ] Apache Spark commented on SPARK-15730: -- User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/14058 > [Spark SQL] the value of 'hiveconf' parameter in Spark-sql CLI don't take > effect in spark-sql session > - > > Key: SPARK-15730 > URL: https://issues.apache.org/jira/browse/SPARK-15730 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Yi Zhou >Priority: Critical > > {noformat} > /usr/lib/spark/bin/spark-sql -v --driver-memory 4g --executor-memory 7g > --executor-cores 5 --num-executors 31 --master yarn-client --conf > spark.yarn.executor.memoryOverhead=1024 --hiveconf RESULT_TABLE=test_result01 > spark-sql> use test; > 16/06/02 21:36:15 INFO execution.SparkSqlParser: Parsing command: use test > 16/06/02 21:36:15 INFO spark.SparkContext: Starting job: processCmd at > CliDriver.java:376 > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Got job 2 (processCmd at > CliDriver.java:376) with 1 output partitions > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 > (processCmd at CliDriver.java:376) > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Parents of final stage: List() > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Missing parents: List() > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Submitting ResultStage 2 > (MapPartitionsRDD[8] at processCmd at CliDriver.java:376), which has no > missing parents > 16/06/02 21:36:15 INFO memory.MemoryStore: Block broadcast_2 stored as values > in memory (estimated size 3.2 KB, free 2.4 GB) > 16/06/02 21:36:15 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as > bytes in memory (estimated size 1964.0 B, free 2.4 GB) > 16/06/02 21:36:15 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in > memory on 192.168.3.11:36189 (size: 1964.0 B, free: 2.4 GB) > 16/06/02 21:36:15 INFO spark.SparkContext: Created broadcast 2 from broadcast > at DAGScheduler.scala:1012 > 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Submitting 1 missing tasks > from ResultStage 2 (MapPartitionsRDD[8] at processCmd at CliDriver.java:376) > 16/06/02 21:36:15 INFO cluster.YarnScheduler: Adding task set 2.0 with 1 tasks > 16/06/02 21:36:15 INFO scheduler.TaskSetManager: Starting task 0.0 in stage > 2.0 (TID 2, 192.168.3.13, partition 0, PROCESS_LOCAL, 5362 bytes) > 16/06/02 21:36:15 INFO cluster.YarnClientSchedulerBackend: Launching task 2 > on executor id: 10 hostname: 192.168.3.13. > 16/06/02 21:36:16 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in > memory on hw-node3:45924 (size: 1964.0 B, free: 4.4 GB) > 16/06/02 21:36:17 INFO scheduler.TaskSetManager: Finished task 0.0 in stage > 2.0 (TID 2) in 1934 ms on 192.168.3.13 (1/1) > 16/06/02 21:36:17 INFO cluster.YarnScheduler: Removed TaskSet 2.0, whose > tasks have all completed, from pool > 16/06/02 21:36:17 INFO scheduler.DAGScheduler: ResultStage 2 (processCmd at > CliDriver.java:376) finished in 1.937 s > 16/06/02 21:36:17 INFO scheduler.DAGScheduler: Job 2 finished: processCmd at > CliDriver.java:376, took 1.962631 s > Time taken: 2.027 seconds > 16/06/02 21:36:17 INFO CliDriver: Time taken: 2.027 seconds > spark-sql> DROP TABLE IF EXISTS ${hiveconf:RESULT_TABLE}; > 16/06/02 21:36:36 INFO execution.SparkSqlParser: Parsing command: DROP TABLE > IF EXISTS ${hiveconf:RESULT_TABLE} > Error in query: > mismatched input '$' expecting {'ADD', 'AS', 'ALL', 'GROUP', 'BY', > 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'LIMIT', 'AT', 'IN', 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'OUTER', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', > 'RANGE', 'ROWS', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', > 'VALUES', 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', > 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'TO', > 'TABLESAMPLE', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', > 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'IF', > 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', > 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', > 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', > 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'EXTENDED', > 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, > 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', > 'STORED', 'DIRECTORIES', 'LOCATION',
[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363345#comment-15363345 ] Michael Gummelt commented on SPARK-11857: - For completeness, here's my theoretical analysis to augment our empirical observation that users don't mind fine-grained mode being removed. Fine-grained mode provides two benefits: 1) Slow-start Executors are brought up lazily 2) Relinquishing cores Cores are relinquished back to Mesos as Spark tasks terminate Fine-grained mdoe does *not* provide the following benefits, though some think it does: a) Relinquishing memory The JVM doesn't relinquish memory, so it would be unsafe for us to resize the cgroup b) Relinquishing executors As for alternatives to the benefits, 1) is provided by dynamic allocation, though we need a better recommended setup for this as I document here: http://apache-spark-developers-list.1001551.n3.nabble.com/HDFS-as-Shuffle-Service-td17340.html There is no alternative to 2), but we've generally found that the executor-level granularity of dynamic allocation is sufficient for most. > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363331#comment-15363331 ] Reynold Xin commented on SPARK-11857: - Michael can you submit a pull request to log deprecation and perhaps also update the docs? > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363324#comment-15363324 ] Sean Owen commented on SPARK-16379: --- Of course we'd all like to never have bugs. Nobody makes bugs on purpose. Bugs exist though, and everyone agrees something has to be fixed. This just states the obvious and is not actionable. What does that mean for _how_ to fix _this_ bug? I will make a PR in any event since I think this issue is understood by now. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-16379: Comment: was deleted (was: > We have a bug and need to address it in the best way we can see. I agree. However, how to do that is what we are discussing up to now.) > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363321#comment-15363321 ] Michael Gummelt commented on SPARK-11857: - [~amcelwee] Do you have any more input on this issue. We're moving forward with deprecating fine-grained mode, but we're willing to solve your issue first. > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363302#comment-15363302 ] Stavros Kontopoulos edited comment on SPARK-16379 at 7/5/16 9:49 PM: - There was no bug previously in the scheduler. It was working before i guess. The project was not broken and the best practice is to keep it that way all the time. I think we can agree on that right? was (Author: skonto): There was no bug previously in the scheduler. It was working or not? The project was not broken and the best practice is to keep it that way all the time. Otherwise if we dont follow best practices we can do anything we want. I dont understand why we cannot even agree on that? > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363314#comment-15363314 ] Stavros Kontopoulos commented on SPARK-16379: - > Nobody is suggesting knowingly making a change that triggers a bug, so I am > not sure what this is arguing against in this context. I am just saying now that we know its an issue, we could revert the commit so we can do the improvements next. Its a blocker. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363311#comment-15363311 ] Michael Gummelt commented on SPARK-11857: - I endorse the deprecation. Fine-grained mode would be more useful if the JVM could shrink in memory as well as cores, but alas... We at Mesosphere haven't heard any objections from users regarding the loss of fine-grained. [~andrewor14] Please cc me if you need Mesos input. Tim is still active, I believe, but no longer at Mesosphere. I work (mostly) full-time on Spark on Mesos. > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt reopened SPARK-11857: - > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363305#comment-15363305 ] Stavros Kontopoulos commented on SPARK-16379: - > We have a bug and need to address it in the best way we can see. I agree. However, how to do that is what we are discussing up to now. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363303#comment-15363303 ] Sean Owen commented on SPARK-16379: --- It's true, I can't say for sure the problem exists without that line. It's suspicious. In any event it seems worth doing away with it while fixing this up, which may still entail reverting the main change for safety but also trying to prove there's no similar problem still lurking in the previous Logging approach that needs any working-around anywhere. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363302#comment-15363302 ] Stavros Kontopoulos commented on SPARK-16379: - There was no bug previously in the scheduler. It was working or not? The project was not broken and the best practice is to keep it that way all the time. Otherwise if we dont follow best practices we can do anything we want. I dont understand why we cannot even agree on that? > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289 ] Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:35 PM: - I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html {quote} One other thing i hope it holds is no new commit should break the project even if it fixes something or reveals another issue etc. {quote} Well I do agree with Sean that it's on us to fix bugs revealed by external changes. was (Author: mgummelt): I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html | One other thing i hope it holds is no new commit should break the project even if it fixes something or reveals another issue etc. Well I do agree with Sean that it's on us to fix bugs revealed by external changes. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289 ] Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:34 PM: - I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html | One other thing i hope it holds is no new commit should break the project even if it fixes something or reveals another issue etc. Well I do agree with Sean that it's on us to fix bugs revealed by external changes. was (Author: mgummelt): I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html > One other thing i hope it holds is no new commit should break the project > even if it fixes something or reveals another issue etc. Well I do agree with Sean that it's on us to fix bugs revealed by external changes. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363296#comment-15363296 ] Sean Owen commented on SPARK-16379: --- I don't think it's even a bad practice, any more than using {{synchronized}}. Ideally, if change A uncovers bug B then it needs to be expanded to address the bug and committed all at once. Nobody is suggesting knowingly making a change that triggers a bug, so I am not sure what this is arguing against in this context. We have a bug and need to address it in the best way we can see. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289 ] Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:33 PM: - I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html > One other thing i hope it holds is no new commit should break the project > even if it fixes something or reveals another issue etc. Well I do agree with Sean that it's on us to fix bugs revealed by external changes. was (Author: mgummelt): I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289 ] Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:31 PM: - I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html was (Author: mgummelt): I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363290#comment-15363290 ] Michael Gummelt commented on SPARK-16379: - Hmmm, since that's a different lock, I don't see the possibility for deadlock in the previous code, but I'm content to relinquish the point. Concurrency is hard :) > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289 ] Michael Gummelt commented on SPARK-16379: - I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363278#comment-15363278 ] Stavros Kontopoulos edited comment on SPARK-16379 at 7/5/16 9:27 PM: - Ok we can be flexible thats not the issue. A warning at least. Given it has caused an issue twice. Sometimes i prefer to be more strict, but its just a suggestion. Could be added as a warning in the project code guidelines for example. One other thing i hope it holds is no new commit should break the project even if it fixes something or reveals another issue etc. was (Author: skonto): Ok we can be flexible thats not the issue. A warning at least. Given it has caused an issue twice. Sometimes i prefer to be more strict, but its just a suggestion. Could be added as a warning in the project code guidelines for example. One other thing i hope it holds is no new commit should break the project even if ti fixes something or reveals another issue. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363278#comment-15363278 ] Stavros Kontopoulos edited comment on SPARK-16379 at 7/5/16 9:26 PM: - Ok we can be flexible thats not the issue. A warning at least. Given it has caused an issue twice. Sometimes i prefer to be more strict, but its just a suggestion. Could be added as a warning in the project code guidelines for example. One other thing i hope it holds is no new commit should break the project even if ti fixes something or reveals another issue. was (Author: skonto): Ok we can be flexible thats not the issue. A warning at least. Given it has caused an issue twice. Sometimes i prefer to be more strict, but its just a suggestion. Could be added as a warning in the project code guidelines for example. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363278#comment-15363278 ] Stavros Kontopoulos commented on SPARK-16379: - Ok we can be flexible thats not the issue. A warning at least. Given it has caused an issue twice. Sometimes i prefer to be more strict, but its just a suggestion. Could be added as a warning in the project code guidelines for example. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363272#comment-15363272 ] Sean Owen commented on SPARK-16379: --- Yes, but that is what I am arguing. Above you said it should be prohibited in all cases. I don't think it should be prohibited. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363270#comment-15363270 ] Stavros Kontopoulos edited comment on SPARK-16379 at 7/5/16 9:18 PM: - > Arguably it's "synchronized" that is the issue here, really. Is it forbidden to use a synchronized block if i know what i am doing? The same applies to the log lazy val. If you know what you are doing i am sure its fine. The problem here is that we have two incompatible code parts and we have to merge them somehow in order to proceed. was (Author: skonto): > Arguably it's "synchronized" that is the issue here, really. Is it forbidden to use a synchronized block if i know what i am doing? The same applies to the log lazy val. If you know what you are doing i am sure its fine. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363269#comment-15363269 ] Sean Owen commented on SPARK-16379: --- Oh I also meant this as the existing workaround: https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-7d99a7c7a051e5e851aaaefb275a44a1L103 > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363270#comment-15363270 ] Stavros Kontopoulos commented on SPARK-16379: - > Arguably it's "synchronized" that is the issue here, really. Is it forbidden to use a synchronized block if i know what i am doing? The same applies to the log lazy val. If you know what you are doing i am sure its fine. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363266#comment-15363266 ] Sean Owen commented on SPARK-16379: --- I mean this: https://github.com/apache/spark/blob/044971eca0ff3c2ce62afa665dbd3072d52cbbec/core/src/main/scala/org/apache/spark/internal/Logging.scala#L94 > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363258#comment-15363258 ] Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:15 PM: - > it's entirely possible that code has a bug that's only revealed when some > other legitimate change happens Of course, but I still don't see the bug that existed previously. Perhaps `synchronized` was unnecessary, but I still see no race condition nor deadlock in the previous code. Maybe following up on this will help: > The previous code also involved acquiring a lock Link? I don't see this. Or do you just mean the null check? https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45 was (Author: mgummelt): > The previous code also involved acquiring a lock Link? I don't see this. Or do you just mean the null check? https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45 > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363256#comment-15363256 ] Michael Gummelt commented on SPARK-16379: - > The previous code also involved acquiring a lock Link? I don't see this. Or do you just mean the null check? https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45 > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363258#comment-15363258 ] Michael Gummelt commented on SPARK-16379: - > The previous code also involved acquiring a lock Link? I don't see this. Or do you just mean the null check? https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45 > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-16379: Comment: was deleted (was: > The previous code also involved acquiring a lock Link? I don't see this. Or do you just mean the null check? https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45 ) > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16363) Spark-submit doesn't work with IAM Roles
[ https://issues.apache.org/jira/browse/SPARK-16363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363249#comment-15363249 ] Ewan Leith commented on SPARK-16363: I'm not sure this is a major issue, but try running with the filesystem path s3a:// as it looks like you're using the legacy Jets3t system which I'm sure doesnt support IAM roles > Spark-submit doesn't work with IAM Roles > > > Key: SPARK-16363 > URL: https://issues.apache.org/jira/browse/SPARK-16363 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.6.2 > Environment: Spark Stand-Alone with EC2 instances configured with IAM > Roles. >Reporter: Ashic Mahtab > > When running Spark Stand-alone in EC2 boxes, > spark-submit --master spark://master-ip:7077 --class Foo > --deploy-mode cluster --verbose s3://bucket/dir/foo/jar > fails to find the jar even if AWS IAM roles are configured to allow the EC2 > boxes (that are running Spark master, and workers) access to the file in S3. > The exception is provided below. It's asking us to set keys, etc. when the > boxes are configured via IAM roles. > 16/07/04 11:44:09 ERROR ClientEndpoint: Exception from cluster was: > java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key > must be specified as the username or password (respectively) of a s3 URL, or > by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties > (respectively). > java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key > must be specified as the username or password (respectively) of a s3 URL, or > by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties > (respectively). > at > org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66) > at > org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) > at com.sun.proxy.$Proxy5.initialize(Unknown Source) > at > org.apache.hadoop.fs.s3.S3FileSystem.initialize(S3FileSystem.java:77) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263) > at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1686) > at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:598) > at org.apache.spark.util.Utils$.fetchFile(Utils.scala:395) > at > org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150) > at > org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:79) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15425) Disallow cartesian joins by default
[ https://issues.apache.org/jira/browse/SPARK-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363218#comment-15363218 ] Apache Spark commented on SPARK-15425: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/14057 > Disallow cartesian joins by default > --- > > Key: SPARK-15425 > URL: https://issues.apache.org/jira/browse/SPARK-15425 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Sameer Agarwal > Fix For: 2.0.0 > > > It is fairly easy for users to shoot themselves in the foot if they run > cartesian joins. Often they might not even be aware of the join methods > chosen. This happened to me a few times in the last few weeks. > It would be a good idea to disable cartesian joins by default, and require > explicit enabling of it via "crossJoin" method or in SQL "cross join". This > however might be too large of a scope for 2.0 given the timing. As a small > and quick fix, we can just have a single config option > (spark.sql.join.enableCartesian) that controls this behavior. In the future > we can implement the fine-grained control. > Note that the error message should be friendly and say "Set > spark.sql.join.enableCartesian to true to turn on cartesian joins." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363199#comment-15363199 ] Sean Owen commented on SPARK-16379: --- I don't agree with that logic; it's entirely possible that code has a bug that's only revealed when some other legitimate change happens, and the right subsequent change is to fix the bug. I don't think we'd ban lazy vals either. Arguably it's "synchronized" that is the issue here, really. Indeed, reverting the last patch only 'fixes' it because the code contained a hack to avoid this condition. The previous code also involved acquiring a lock, and I'm guessing it _could_ still be a problem, though less likely to come up given that the locking only happens during the first call (well hopefully). Removing the logInfo actually removes the issue more directly than reintroducing the hack. Changing the startScheduler method is _probably_ the right-er fix, though that's less conservative. I'm not against reverting the change just on the grounds that Logging is inherited lots of places and so there's a risk of a repeat of this problem elsewhere, even if it may ultimately also be due to some other coding problem. I'd just rather not also reintroduce the hack. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16367) Wheelhouse Support for PySpark
[ https://issues.apache.org/jira/browse/SPARK-16367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363189#comment-15363189 ] Semet commented on SPARK-16367: --- This is where the magic of wheels lies: - look at https://pypi.python.org/pypi/numpy , there are all wheels for various Python version, 32/64b, Linux/Mac/Windows. Simply copy from pypi.python.org + some drops, and that's all, no compilation needed - no compilation is needed upon installation, and of all wheels are put in the wheelhouse archive the installation will only consist of package unzipping (automatically handled by pip install) - the creation of wheelhouse is really simple: pip install wheels, and then pip wheels. I'll write a tutorial in the documentation. I have rebased your patch actually, so the cache thing will be kept:) > Wheelhouse Support for PySpark > -- > > Key: SPARK-16367 > URL: https://issues.apache.org/jira/browse/SPARK-16367 > Project: Spark > Issue Type: New Feature > Components: Deploy, PySpark >Affects Versions: 1.6.1, 1.6.2, 2.0.0 >Reporter: Semet > Labels: newbie, python, python-wheel, wheelhouse > Original Estimate: 168h > Remaining Estimate: 168h > > *Rational* > Is it recommended, in order to deploying Scala packages written in Scala, to > build big fat jar files. This allows to have all dependencies on one package > so the only "cost" is copy time to deploy this file on every Spark Node. > On the other hand, Python deployment is more difficult once you want to use > external packages, and you don't really want to mess with the IT to deploy > the packages on the virtualenv of each nodes. > *Previous approaches* > I based the current proposal over the two following bugs related to this > point: > - SPARK-6764 ("Wheel support for PySpark") > - SPARK-13587("Support virtualenv in PySpark") > First part of my proposal was to merge, in order to support wheels install > and virtualenv creation > *Uber Fat Wheelhouse for Python Deployment* > In Python, the packaging standard is now "wheels", which goes further that > old good ".egg" files. With a wheel file (".whl"), the package is already > prepared for a given architecture. You can have several wheel, each specific > to an architecture, or environment. > The {{pip}} tools now how to select the package matching the current system, > how to install this package in a light speed. Said otherwise, package that > requires compilation of a C module, for instance, does *not* compile anything > when installing from wheel file. > {{pip}} also provides the ability to generate easily all wheel of all > packages used for a given module (inside a "virtualenv"). This is called > "wheelhouse". You can even don't mess with this compilation and retrieve it > directly from pypi.python.org. > *Developer workflow* > Here is, in a more concrete way, my proposal for on Pyspark developers point > of view: > - you are writing a PySpark script that increase in term of size and > dependencies. Deploying on Spark for example requires to build numpy or > Theano and other dependencies > - to use "Big Fat Wheelhouse" support of Pyspark, you need to turn his script > into a standard Python package: > -- write a {{requirements.txt}}. I recommend to specify all package version. > You can use [pip-tools|https://github.com/nvie/pip-tools] to maintain the > requirements.txt > {code} > astroid==1.4.6# via pylint > autopep8==1.2.4 > click==6.6# via pip-tools > colorama==0.3.7 # via pylint > enum34==1.1.6 # via hypothesis > findspark==1.0.0 # via spark-testing-base > first==2.0.1 # via pip-tools > hypothesis==3.4.0 # via spark-testing-base > lazy-object-proxy==1.2.2 # via astroid > linecache2==1.0.0 # via traceback2 > pbr==1.10.0 > pep8==1.7.0 # via autopep8 > pip-tools==1.6.5 > py==1.4.31# via pytest > pyflakes==1.2.3 > pylint==1.5.6 > pytest==2.9.2 # via spark-testing-base > six==1.10.0 # via astroid, pip-tools, pylint, unittest2 > spark-testing-base==0.0.7.post2 > traceback2==1.4.0 # via unittest2 > unittest2==1.1.0 # via spark-testing-base > wheel==0.29.0 > wrapt==1.10.8 # via astroid > {code} > -- write a setup.py with some entry points or package. Use > [PBR|http://docs.openstack.org/developer/pbr/] it makes the jobs of > maitaining a setup.py files really easy > -- create a virtualenv if not already in one: > {code} > virtualenv env > {code} > -- Work on your environment, define the requirement you need in > {{requirements.txt}}, do all the {{pip install}} you need. > - create the wheelhouse for your current project > {code} > pip install wheelhouse > pip wheel . --wheel-dir wheelhouse > {code} > This can take some
[jira] [Assigned] (SPARK-16385) NoSuchMethodException thrown by Utils.waitForProcess
[ https://issues.apache.org/jira/browse/SPARK-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16385: Assignee: (was: Apache Spark) > NoSuchMethodException thrown by Utils.waitForProcess > > > Key: SPARK-16385 > URL: https://issues.apache.org/jira/browse/SPARK-16385 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin > > The code in Utils.waitForProcess catches the wrong exception: when using > reflection, {{NoSuchMethodException}} is thrown, but the code catches > {{NoSuchMethodError}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16385) NoSuchMethodException thrown by Utils.waitForProcess
[ https://issues.apache.org/jira/browse/SPARK-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363154#comment-15363154 ] Apache Spark commented on SPARK-16385: -- User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/14056 > NoSuchMethodException thrown by Utils.waitForProcess > > > Key: SPARK-16385 > URL: https://issues.apache.org/jira/browse/SPARK-16385 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin > > The code in Utils.waitForProcess catches the wrong exception: when using > reflection, {{NoSuchMethodException}} is thrown, but the code catches > {{NoSuchMethodError}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16385) NoSuchMethodException thrown by Utils.waitForProcess
[ https://issues.apache.org/jira/browse/SPARK-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16385: Assignee: Apache Spark > NoSuchMethodException thrown by Utils.waitForProcess > > > Key: SPARK-16385 > URL: https://issues.apache.org/jira/browse/SPARK-16385 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin >Assignee: Apache Spark > > The code in Utils.waitForProcess catches the wrong exception: when using > reflection, {{NoSuchMethodException}} is thrown, but the code catches > {{NoSuchMethodError}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16385) NoSuchMethodException thrown by Utils.waitForProcess
Marcelo Vanzin created SPARK-16385: -- Summary: NoSuchMethodException thrown by Utils.waitForProcess Key: SPARK-16385 URL: https://issues.apache.org/jira/browse/SPARK-16385 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.0 Reporter: Marcelo Vanzin The code in Utils.waitForProcess catches the wrong exception: when using reflection, {{NoSuchMethodException}} is thrown, but the code catches {{NoSuchMethodError}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16385) NoSuchMethodException thrown by Utils.waitForProcess
[ https://issues.apache.org/jira/browse/SPARK-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363153#comment-15363153 ] Marcelo Vanzin commented on SPARK-16385: Here's what I see when running unit tests on java 7: {noformat} Exception in thread "ExecutorRunner for app-20160705131725-/1" java.lang.NoSuchMethodException: java.lang.Process.waitFor(long, java.util.concurrent.TimeUnit) at java.lang.Class.getMethod(Class.java:1678) at org.apache.spark.util.Utils$.waitForProcess(Utils.scala:1812) at org.apache.spark.util.Utils$.terminateProcess(Utils.scala:1783) at org.apache.spark.deploy.worker.ExecutorRunner.org$apache$spark$deploy$worker$ExecutorRunner$$killProcess(ExecutorRunner.scala:101) at org.apache.spark.deploy.worker.ExecutorRunner.org$apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:185) at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73) {noformat} > NoSuchMethodException thrown by Utils.waitForProcess > > > Key: SPARK-16385 > URL: https://issues.apache.org/jira/browse/SPARK-16385 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin > > The code in Utils.waitForProcess catches the wrong exception: when using > reflection, {{NoSuchMethodException}} is thrown, but the code catches > {{NoSuchMethodError}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org