[jira] [Commented] (SPARK-16007) Empty DataFrame created with spark.read.csv() does not respect user specified schema

2016-07-05 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363825#comment-15363825
 ] 

Reynold Xin commented on SPARK-16007:
-

Is this done yet?


> Empty DataFrame created with spark.read.csv() does not respect user specified 
> schema
> 
>
> Key: SPARK-16007
> URL: https://issues.apache.org/jira/browse/SPARK-16007
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Minor
>
> {{spark.schema(someSchema).csv().schema != someSchema}}
> The schema of the empty DF created with {{csv()}} has no fields.
> This problem will occur for json, text, parquet, orc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16007) Empty DataFrame created with spark.read.csv() does not respect user specified schema

2016-07-05 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16007:

Target Version/s: 2.1.0  (was: 2.0.0)

> Empty DataFrame created with spark.read.csv() does not respect user specified 
> schema
> 
>
> Key: SPARK-16007
> URL: https://issues.apache.org/jira/browse/SPARK-16007
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Minor
>
> {{spark.schema(someSchema).csv().schema != someSchema}}
> The schema of the empty DF created with {{csv()}} has no fields.
> This problem will occur for json, text, parquet, orc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16387) Reserved SQL words are not escaped by JDBC writer

2016-07-05 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363817#comment-15363817
 ] 

Dongjoon Hyun commented on SPARK-16387:
---

Then, could you make a PR for this?

> Reserved SQL words are not escaped by JDBC writer
> -
>
> Key: SPARK-16387
> URL: https://issues.apache.org/jira/browse/SPARK-16387
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Lev
>
> Here is a code (imports are omitted)
> object Main extends App {
>   val sqlSession = SparkSession.builder().config(new SparkConf().
> setAppName("Sql Test").set("spark.app.id", "SQLTest").
> set("spark.master", "local[2]").
> set("spark.ui.enabled", "false")
> .setJars(Seq("/mysql/mysql-connector-java-5.1.38.jar" ))
>   ).getOrCreate()
>   import sqlSession.implicits._
>   val localprops = new Properties
>   localprops.put("user", "")
>   localprops.put("password", "")
>   val df = sqlSession.createDataset(Seq("a","b","c")).toDF("order")
>   val writer = df.write
>   .mode(SaveMode.Append)
>   writer
>   .jdbc("jdbc:mysql://localhost:3306/test3", s"jira_test", localprops)
> }
> End error is :
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error 
> in your SQL syntax; check the manual that corresponds to your MySQL server 
> version for the right syntax to use near 'order TEXT )' at line 1
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> Clearly the reserved word  has to be quoted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16387) Reserved SQL words are not escaped by JDBC writer

2016-07-05 Thread Lev (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363802#comment-15363802
 ] 

Lev commented on SPARK-16387:
-

JdbcDialect class has a functionality that allows DB-dependent quotation. 
Please note that quotation has to be applied to all SQL statement generation. 
code

> Reserved SQL words are not escaped by JDBC writer
> -
>
> Key: SPARK-16387
> URL: https://issues.apache.org/jira/browse/SPARK-16387
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Lev
>
> Here is a code (imports are omitted)
> object Main extends App {
>   val sqlSession = SparkSession.builder().config(new SparkConf().
> setAppName("Sql Test").set("spark.app.id", "SQLTest").
> set("spark.master", "local[2]").
> set("spark.ui.enabled", "false")
> .setJars(Seq("/mysql/mysql-connector-java-5.1.38.jar" ))
>   ).getOrCreate()
>   import sqlSession.implicits._
>   val localprops = new Properties
>   localprops.put("user", "")
>   localprops.put("password", "")
>   val df = sqlSession.createDataset(Seq("a","b","c")).toDF("order")
>   val writer = df.write
>   .mode(SaveMode.Append)
>   writer
>   .jdbc("jdbc:mysql://localhost:3306/test3", s"jira_test", localprops)
> }
> End error is :
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error 
> in your SQL syntax; check the manual that corresponds to your MySQL server 
> version for the right syntax to use near 'order TEXT )' at line 1
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> Clearly the reserved word  has to be quoted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16389) Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`

2016-07-05 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-16389.
-
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 14062
[https://github.com/apache/spark/pull/14062]

> Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and 
> `SparkHiveDynamicPartitionWriterContainer`
> -
>
> Key: SPARK-16389
> URL: https://issues.apache.org/jira/browse/SPARK-16389
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Minor
> Fix For: 2.1.0
>
>
> - Remove useless `MetastoreRelation` from the signature of 
> `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. 
> - Avoid extra metadata retrieval using Hive client in `InsertIntoHiveTable`. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.

2016-07-05 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-16340.
-
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 2.1.0

> In regexp_replace function column and/or column expression should also 
> allowed as replacement.
> --
>
> Key: SPARK-16340
> URL: https://issues.apache.org/jira/browse/SPARK-16340
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Mukul Garg
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.1.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, regexp_replace function only take string argument as replacement, 
> but in Hive it also accept any column or column expression. It also works in 
> Spark, but as a query. Need to create a overload of this function which also 
> accept Column as replacement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.

2016-07-05 Thread Mukul Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363775#comment-15363775
 ] 

Mukul Garg commented on SPARK-16340:


I have checked the PR. This is what I have request. Thanks for taking this. :)

> In regexp_replace function column and/or column expression should also 
> allowed as replacement.
> --
>
> Key: SPARK-16340
> URL: https://issues.apache.org/jira/browse/SPARK-16340
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Mukul Garg
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, regexp_replace function only take string argument as replacement, 
> but in Hive it also accept any column or column expression. It also works in 
> Spark, but as a query. Need to create a overload of this function which also 
> accept Column as replacement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16371) IS NOT NULL clause gives false for nested not empty column

2016-07-05 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363768#comment-15363768
 ] 

Hyukjin Kwon commented on SPARK-16371:
--

Sorry for being noisy, here is the Scala version

{code}
case class Parent(a: Child)
case class Child(a: Long)
spark.range(10).map(num => 
Parent(Child(num))).write.mode("overwrite").parquet("/tmp/test")
spark.read.parquet("/tmp/test").where("a is not null").count() # 0
{code}

It seems it fails to apply the filter from Parquet when both inner column name 
and outer column name are the same.

I will look into this deeper.

> IS NOT NULL clause gives false for nested not empty column
> --
>
> Key: SPARK-16371
> URL: https://issues.apache.org/jira/browse/SPARK-16371
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Maciej Bryński
>Priority: Critical
>
> I have df where column1 is struct type and there is 1M rows.
> (sample data from https://issues.apache.org/jira/browse/SPARK-16320)
> {code}
> df.where("column1 is not null").count()
> {code}
> gives:
> 1M in Spark 1.6
> *0* in Spark 2.0
> Is there a change in IS NOT NULL behaviour in Spark 2.0 ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16371) IS NOT NULL clause gives false for nested not empty column

2016-07-05 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363757#comment-15363757
 ] 

Hyukjin Kwon edited comment on SPARK-16371 at 7/6/16 4:34 AM:
--

Here is shorter codes

{code}
from pyspark.sql.functions import struct

child_df = spark.range(10)
parent_df = child_df.select(struct("id").alias("id"))
parent_df.write.mode('overwrite').parquet('/tmp/test')
parent_df = spark.read.parquet('/tmp/test')

parent_df.where("id is not null").count() # 0
parent_df.count() # 10
{code}


was (Author: hyukjin.kwon):
Here is shorter codes

{code}
from pyspark.sql.functions import struct
from pyspark.sql import Row

path = '/tmp/test'
rdd = sc.parallelize(range(10))   
data = rdd.map(lambda r: Row(column0=r))

child_df = spark.createDataFrame(data)
parent_df = child_df.select(struct("column0").alias("column0"))

parent_df.write.mode('overwrite').parquet(path)
parent_df = spark.read.parquet(path)
parent_df.where("column0 is not null").count()
{code}

> IS NOT NULL clause gives false for nested not empty column
> --
>
> Key: SPARK-16371
> URL: https://issues.apache.org/jira/browse/SPARK-16371
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Maciej Bryński
>Priority: Critical
>
> I have df where column1 is struct type and there is 1M rows.
> (sample data from https://issues.apache.org/jira/browse/SPARK-16320)
> {code}
> df.where("column1 is not null").count()
> {code}
> gives:
> 1M in Spark 1.6
> *0* in Spark 2.0
> Is there a change in IS NOT NULL behaviour in Spark 2.0 ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16371) IS NOT NULL clause gives false for nested not empty column

2016-07-05 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363757#comment-15363757
 ] 

Hyukjin Kwon commented on SPARK-16371:
--

Here is shorter codes

{code}
from pyspark.sql.functions import struct
from pyspark.sql import Row

path = '/tmp/test'
rdd = sc.parallelize(range(10))   
data = rdd.map(lambda r: Row(column0=r))

child_df = spark.createDataFrame(data)
parent_df = child_df.select(struct("column0").alias("column0"))

parent_df.write.mode('overwrite').parquet(path)
parent_df = spark.read.parquet(path)
parent_df.where("column0 is not null").count()
{code}

> IS NOT NULL clause gives false for nested not empty column
> --
>
> Key: SPARK-16371
> URL: https://issues.apache.org/jira/browse/SPARK-16371
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Maciej Bryński
>Priority: Critical
>
> I have df where column1 is struct type and there is 1M rows.
> (sample data from https://issues.apache.org/jira/browse/SPARK-16320)
> {code}
> df.where("column1 is not null").count()
> {code}
> gives:
> 1M in Spark 1.6
> *0* in Spark 2.0
> Is there a change in IS NOT NULL behaviour in Spark 2.0 ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16389) Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`

2016-07-05 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-16389:

Assignee: Xiao Li

> Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and 
> `SparkHiveDynamicPartitionWriterContainer`
> -
>
> Key: SPARK-16389
> URL: https://issues.apache.org/jira/browse/SPARK-16389
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Minor
>
> - Remove useless `MetastoreRelation` from the signature of 
> `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. 
> - Avoid extra metadata retrieval using Hive client in `InsertIntoHiveTable`. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15730) [Spark SQL] the value of 'hiveconf' parameter in Spark-sql CLI don't take effect in spark-sql session

2016-07-05 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363691#comment-15363691
 ] 

Yi Zhou commented on SPARK-15730:
-

Thanks a lot [~chenghao] and [~yhuai] !

> [Spark SQL] the value of 'hiveconf' parameter in Spark-sql CLI don't take 
> effect in spark-sql session
> -
>
> Key: SPARK-15730
> URL: https://issues.apache.org/jira/browse/SPARK-15730
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Yi Zhou
>Assignee: Cheng Hao
>Priority: Critical
> Fix For: 2.0.0
>
>
> {noformat}
> /usr/lib/spark/bin/spark-sql -v --driver-memory 4g --executor-memory 7g 
> --executor-cores 5 --num-executors 31 --master yarn-client --conf 
> spark.yarn.executor.memoryOverhead=1024 --hiveconf RESULT_TABLE=test_result01
> spark-sql> use test;
> 16/06/02 21:36:15 INFO execution.SparkSqlParser: Parsing command: use test
> 16/06/02 21:36:15 INFO spark.SparkContext: Starting job: processCmd at 
> CliDriver.java:376
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Got job 2 (processCmd at 
> CliDriver.java:376) with 1 output partitions
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 
> (processCmd at CliDriver.java:376)
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Parents of final stage: List()
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Missing parents: List()
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Submitting ResultStage 2 
> (MapPartitionsRDD[8] at processCmd at CliDriver.java:376), which has no 
> missing parents
> 16/06/02 21:36:15 INFO memory.MemoryStore: Block broadcast_2 stored as values 
> in memory (estimated size 3.2 KB, free 2.4 GB)
> 16/06/02 21:36:15 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as 
> bytes in memory (estimated size 1964.0 B, free 2.4 GB)
> 16/06/02 21:36:15 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in 
> memory on 192.168.3.11:36189 (size: 1964.0 B, free: 2.4 GB)
> 16/06/02 21:36:15 INFO spark.SparkContext: Created broadcast 2 from broadcast 
> at DAGScheduler.scala:1012
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Submitting 1 missing tasks 
> from ResultStage 2 (MapPartitionsRDD[8] at processCmd at CliDriver.java:376)
> 16/06/02 21:36:15 INFO cluster.YarnScheduler: Adding task set 2.0 with 1 tasks
> 16/06/02 21:36:15 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
> 2.0 (TID 2, 192.168.3.13, partition 0, PROCESS_LOCAL, 5362 bytes)
> 16/06/02 21:36:15 INFO cluster.YarnClientSchedulerBackend: Launching task 2 
> on executor id: 10 hostname: 192.168.3.13.
> 16/06/02 21:36:16 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in 
> memory on hw-node3:45924 (size: 1964.0 B, free: 4.4 GB)
> 16/06/02 21:36:17 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
> 2.0 (TID 2) in 1934 ms on 192.168.3.13 (1/1)
> 16/06/02 21:36:17 INFO cluster.YarnScheduler: Removed TaskSet 2.0, whose 
> tasks have all completed, from pool
> 16/06/02 21:36:17 INFO scheduler.DAGScheduler: ResultStage 2 (processCmd at 
> CliDriver.java:376) finished in 1.937 s
> 16/06/02 21:36:17 INFO scheduler.DAGScheduler: Job 2 finished: processCmd at 
> CliDriver.java:376, took 1.962631 s
> Time taken: 2.027 seconds
> 16/06/02 21:36:17 INFO CliDriver: Time taken: 2.027 seconds
> spark-sql> DROP TABLE IF EXISTS ${hiveconf:RESULT_TABLE};
> 16/06/02 21:36:36 INFO execution.SparkSqlParser: Parsing command: DROP TABLE 
> IF EXISTS ${hiveconf:RESULT_TABLE}
> Error in query:
> mismatched input '$' expecting {'ADD', 'AS', 'ALL', 'GROUP', 'BY', 
> 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'LIMIT', 'AT', 'IN', 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'OUTER', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 
> 'RANGE', 'ROWS', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 
> 'VALUES', 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 
> 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'TO', 
> 'TABLESAMPLE', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 
> 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'IF', 
> 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 
> 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 
> 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 
> 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'EXTENDED', 
> 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 
> 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 
> 'STORED', 'DIRECTORIES', 'LOCATION', 

[jira] [Commented] (SPARK-16387) Reserved SQL words are not escaped by JDBC writer

2016-07-05 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363682#comment-15363682
 ] 

Dongjoon Hyun commented on SPARK-16387:
---

Hi,
`escaping` sounds possible, but it is not an easy issue to implement that 
portably for all DB.
We need to support various MySQL, PostgreSQL, MSSQL, and so on.
The standard is double quote ("), but even MySQL does not support that 
naturally. (Only supported in a ANSI mode?).
MySQL uses backtick (`), but PostgreSQL does not (if I remember correctly.) 
MSSQL uses '[]'.

I want to help you with this issue, but I've no idea. Do you have any idea for 
this? 

> Reserved SQL words are not escaped by JDBC writer
> -
>
> Key: SPARK-16387
> URL: https://issues.apache.org/jira/browse/SPARK-16387
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Lev
>
> Here is a code (imports are omitted)
> object Main extends App {
>   val sqlSession = SparkSession.builder().config(new SparkConf().
> setAppName("Sql Test").set("spark.app.id", "SQLTest").
> set("spark.master", "local[2]").
> set("spark.ui.enabled", "false")
> .setJars(Seq("/mysql/mysql-connector-java-5.1.38.jar" ))
>   ).getOrCreate()
>   import sqlSession.implicits._
>   val localprops = new Properties
>   localprops.put("user", "")
>   localprops.put("password", "")
>   val df = sqlSession.createDataset(Seq("a","b","c")).toDF("order")
>   val writer = df.write
>   .mode(SaveMode.Append)
>   writer
>   .jdbc("jdbc:mysql://localhost:3306/test3", s"jira_test", localprops)
> }
> End error is :
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error 
> in your SQL syntax; check the manual that corresponds to your MySQL server 
> version for the right syntax to use near 'order TEXT )' at line 1
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> Clearly the reserved word  has to be quoted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16286) Implement stack table generating function

2016-07-05 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-16286:

Assignee: Dongjoon Hyun

> Implement stack table generating function
> -
>
> Key: SPARK-16286
> URL: https://issues.apache.org/jira/browse/SPARK-16286
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Dongjoon Hyun
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16286) Implement stack table generating function

2016-07-05 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-16286.
-
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 14033
[https://github.com/apache/spark/pull/14033]

> Implement stack table generating function
> -
>
> Key: SPARK-16286
> URL: https://issues.apache.org/jira/browse/SPARK-16286
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Dongjoon Hyun
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16389) Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`

2016-07-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16389:


Assignee: (was: Apache Spark)

> Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and 
> `SparkHiveDynamicPartitionWriterContainer`
> -
>
> Key: SPARK-16389
> URL: https://issues.apache.org/jira/browse/SPARK-16389
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Priority: Minor
>
> - Remove useless `MetastoreRelation` from the signature of 
> `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. 
> - Avoid extra metadata retrieval using Hive client in `InsertIntoHiveTable`. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16389) Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`

2016-07-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363594#comment-15363594
 ] 

Apache Spark commented on SPARK-16389:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/14062

> Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and 
> `SparkHiveDynamicPartitionWriterContainer`
> -
>
> Key: SPARK-16389
> URL: https://issues.apache.org/jira/browse/SPARK-16389
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Priority: Minor
>
> - Remove useless `MetastoreRelation` from the signature of 
> `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. 
> - Avoid extra metadata retrieval using Hive client in `InsertIntoHiveTable`. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16389) Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`

2016-07-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16389:


Assignee: Apache Spark

> Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and 
> `SparkHiveDynamicPartitionWriterContainer`
> -
>
> Key: SPARK-16389
> URL: https://issues.apache.org/jira/browse/SPARK-16389
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Minor
>
> - Remove useless `MetastoreRelation` from the signature of 
> `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. 
> - Avoid extra metadata retrieval using Hive client in `InsertIntoHiveTable`. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15761) pyspark shell should load if PYSPARK_DRIVER_PYTHON is ipython an Python3

2016-07-05 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-15761:

Fix Version/s: (was: 2.0.1)
   2.0.0

> pyspark shell should load if PYSPARK_DRIVER_PYTHON is ipython an Python3
> 
>
> Key: SPARK-15761
> URL: https://issues.apache.org/jira/browse/SPARK-15761
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Manoj Kumar
>Assignee: Manoj Kumar
>Priority: Minor
> Fix For: 1.6.3, 2.0.0
>
>
> My default python is ipython3 and it is odd that it fails with "IPython 
> requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16182) Utils.scala -- terminateProcess() should call Process.destroyForcibly() if and only if Process.destroy() fails

2016-07-05 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16182:

Fix Version/s: (was: 2.0.1)
   2.0.0

> Utils.scala -- terminateProcess() should call Process.destroyForcibly() if 
> and only if Process.destroy() fails
> --
>
> Key: SPARK-16182
> URL: https://issues.apache.org/jira/browse/SPARK-16182
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: OSX El Capitan (java "1.8.0_65"), Oracle Linux 6 (java 
> 1.8.0_92-b14)
>Reporter: Christian Chua
>Assignee: Sean Owen
>Priority: Critical
> Fix For: 1.6.3, 2.0.0
>
>
> Spark streaming documentation recommends application developers create static 
> connection pools. To clean up this pool, we add a shutdown hook.
> The problem is that in spark 1.6.1, the shutdown hook for an executor will be 
> called only for the first submitted job.  (on the second and subsequent job 
> submissions, the shutdown hook for the executor will NOT be invoked)
> problem not seen when using java 1.7
> problem not seen when using spark 1.6.0
> looks like this bug is caused by this modification from 1.6.0 to 1.6.1:
> https://issues.apache.org/jira/browse/SPARK-12486
> steps to reproduce the problem :
> 1.) install spark 1.6.1
> 2.) submit this basic spark application
> import org.apache.spark.{ SparkContext, SparkConf }
> object MyPool {
> def printToFile( f : java.io.File )( op : java.io.PrintWriter => Unit ) {
> val p = new java.io.PrintWriter(f)
> try {
> op(p)
> }
> finally {
> p.close()
> }
> }
> def myfunc( ) = {
> "a"
> }
> def createEvidence( ) = {
> printToFile(new java.io.File("/var/tmp/evidence.txt")) { p =>
> p.println("the evidence")
> }
> }
> sys.addShutdownHook {
> createEvidence()
> }
> }
> object BasicSpark {
> def main( args : Array[String] ) = {
> val sparkConf = new SparkConf().setAppName("BasicPi")
> val sc = new SparkContext(sparkConf)
> sc.parallelize(1 to 2).foreach { i => println("f : " + 
> MyPool.myfunc())
> }
> sc.stop()
> }
> }
> 3.) you will see that /var/tmp/evidence.txt is created
> 4.) now delete this file 
> 5.) submit a second job
> 6.) you will see that /var/tmp/evidence.txt is no longer created on the 
> second submission
> 7.) if you use java 7 or spark 1.6.0, the evidence file will be created on 
> the second and subsequent submits



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16389) Remove useless `MetastoreRelation` from `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`

2016-07-05 Thread Xiao Li (JIRA)
Xiao Li created SPARK-16389:
---

 Summary: Remove useless `MetastoreRelation` from 
`SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`
 Key: SPARK-16389
 URL: https://issues.apache.org/jira/browse/SPARK-16389
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li
Priority: Minor


- Remove useless `MetastoreRelation` from the signature of 
`SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. 
- Avoid extra metadata retrieval using Hive client in `InsertIntoHiveTable`. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16353) Intended javadoc options are not honored for Java unidoc

2016-07-05 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16353:

Fix Version/s: (was: 2.0.1)
   2.0.0

> Intended javadoc options are not honored for Java unidoc
> 
>
> Key: SPARK-16353
> URL: https://issues.apache.org/jira/browse/SPARK-16353
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Documentation
>Affects Versions: 1.6.2, 2.0.0, 2.0.1, 2.1.0
>Reporter: Michael Allman
>Assignee: Michael Allman
>Priority: Minor
> Fix For: 1.6.3, 2.0.0
>
>
> {{project/SparkBuild.scala}} specifies
> {noformat}
> javacOptions in doc := Seq(
>   "-windowtitle", "Spark " + version.value.replaceAll("-SNAPSHOT", "") + 
> "JavaDoc",
>   "-public",
>   "-noqualifier", "java.lang"
> )
> {noformat}
> However, {{sbt javaunidoc:doc}} ignores these options. To wit, the title of 
> http://spark.apache.org/docs/latest/api/java/index.html is {{Generated 
> Documentation (Untitled)}}, not {{Spark 1.6.2 JavaDoc}} as it should be.
> (N.B. the Spark 1.6.2 build defines several javadoc groups as well, which are 
> also missing from the official docs.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16382) YARN - Dynamic allocation with spark.executor.instances should increase max executors.

2016-07-05 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363592#comment-15363592
 ] 

Saisai Shao commented on SPARK-16382:
-

I would suggest to fail and complain. Max usually specifies the upper bound of 
resources can be used for Spark, usually it should not be exceeded.

Also in the YarnSparkHadoopUtil.scala, we have such constraint:

{code}
  require(initialNumExecutors >= minNumExecutors && initialNumExecutors <= 
maxNumExecutors,
s"initial executor number $initialNumExecutors must between min 
executor number " +
  s"$minNumExecutors and max executor number $maxNumExecutors")
{code}


> YARN - Dynamic allocation with spark.executor.instances should increase max 
> executors.
> --
>
> Key: SPARK-16382
> URL: https://issues.apache.org/jira/browse/SPARK-16382
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Reporter: Ryan Blue
>
> SPARK-13723 changed the behavior of dynamic allocation when 
> {{--num-executors}} ({{spark.executor.instances}}) is set. Rather than 
> turning off dynamic allocation, the value is used as the initial number of 
> executors. This did not change the behavior of 
> {{spark.dynamicAllocation.maxExecutors}}. We've noticed that some users set 
> {{--num-executors}} higher than the max and the expectation is that the max 
> increases.
> I think that either max should be increased, or Spark should fail and 
> complain that the executors requested is higher than the max.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16388) Remove spark.sql.nativeView and spark.sql.nativeView.canonical config

2016-07-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16388:


Assignee: Apache Spark  (was: Reynold Xin)

> Remove spark.sql.nativeView and spark.sql.nativeView.canonical config
> -
>
> Key: SPARK-16388
> URL: https://issues.apache.org/jira/browse/SPARK-16388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> These two configs should not be relevant anymore after Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16388) Remove spark.sql.nativeView and spark.sql.nativeView.canonical config

2016-07-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16388:


Assignee: Reynold Xin  (was: Apache Spark)

> Remove spark.sql.nativeView and spark.sql.nativeView.canonical config
> -
>
> Key: SPARK-16388
> URL: https://issues.apache.org/jira/browse/SPARK-16388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> These two configs should not be relevant anymore after Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16388) Remove spark.sql.nativeView and spark.sql.nativeView.canonical config

2016-07-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363585#comment-15363585
 ] 

Apache Spark commented on SPARK-16388:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/14061

> Remove spark.sql.nativeView and spark.sql.nativeView.canonical config
> -
>
> Key: SPARK-16388
> URL: https://issues.apache.org/jira/browse/SPARK-16388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> These two configs should not be relevant anymore after Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16388) Remove spark.sql.nativeView and spark.sql.nativeView.canonical config

2016-07-05 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16388:
---

 Summary: Remove spark.sql.nativeView and 
spark.sql.nativeView.canonical config
 Key: SPARK-16388
 URL: https://issues.apache.org/jira/browse/SPARK-16388
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


These two configs should not be relevant anymore after Spark 2.0.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16371) IS NOT NULL clause gives false for nested not empty column

2016-07-05 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363548#comment-15363548
 ] 

Hyukjin Kwon commented on SPARK-16371:
--

[~maver1ck] [~proflin] I could reproduce this. I will try to narrow it down.

> IS NOT NULL clause gives false for nested not empty column
> --
>
> Key: SPARK-16371
> URL: https://issues.apache.org/jira/browse/SPARK-16371
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Maciej Bryński
>Priority: Critical
>
> I have df where column1 is struct type and there is 1M rows.
> (sample data from https://issues.apache.org/jira/browse/SPARK-16320)
> {code}
> df.where("column1 is not null").count()
> {code}
> gives:
> 1M in Spark 1.6
> *0* in Spark 2.0
> Is there a change in IS NOT NULL behaviour in Spark 2.0 ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16367) Wheelhouse Support for PySpark

2016-07-05 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363509#comment-15363509
 ] 

Jeff Zhang commented on SPARK-16367:


Preparing the wheelhouse seems time consuming to me especially when many 
packages are needed and themselves also have dependencies as well.  If internet 
is accessible, I would rather ask the cluster to do that.  
[~gae...@xeberon.net] Do you know whether local repository is supported by 
python ? So that cluster administrator can create a private wheelhouse so that 
all the machines in the cluster can access that repository just like private 
maven repository. 

> Wheelhouse Support for PySpark
> --
>
> Key: SPARK-16367
> URL: https://issues.apache.org/jira/browse/SPARK-16367
> Project: Spark
>  Issue Type: New Feature
>  Components: Deploy, PySpark
>Affects Versions: 1.6.1, 1.6.2, 2.0.0
>Reporter: Semet
>  Labels: newbie, python, python-wheel, wheelhouse
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> *Rational*
> Is it recommended, in order to deploying Scala packages written in Scala, to 
> build big fat jar files. This allows to have all dependencies on one package 
> so the only "cost" is copy time to deploy this file on every Spark Node.
> On the other hand, Python deployment is more difficult once you want to use 
> external packages, and you don't really want to mess with the IT to deploy 
> the packages on the virtualenv of each nodes.
> *Previous approaches*
> I based the current proposal over the two following bugs related to this 
> point:
> - SPARK-6764 ("Wheel support for PySpark")
> - SPARK-13587("Support virtualenv in PySpark")
> First part of my proposal was to merge, in order to support wheels install 
> and virtualenv creation
> *Uber Fat Wheelhouse for Python Deployment*
> In Python, the packaging standard is now "wheels", which goes further that 
> old good ".egg" files. With a wheel file (".whl"), the package is already 
> prepared for a given architecture. You can have several wheel, each specific 
> to an architecture, or environment. 
> The {{pip}} tools now how to select the package matching the current system, 
> how to install this package in a light speed. Said otherwise, package that 
> requires compilation of a C module, for instance, does *not* compile anything 
> when installing from wheel file.
> {{pip}} also provides the ability to generate easily all wheel of all 
> packages used for a given module (inside a "virtualenv"). This is called 
> "wheelhouse". You can even don't mess with this compilation and retrieve it 
> directly from pypi.python.org.
> *Developer workflow*
> Here is, in a more concrete way, my proposal for on Pyspark developers point 
> of view:
> - you are writing a PySpark script that increase in term of size and 
> dependencies. Deploying on Spark for example requires to build numpy or 
> Theano and other dependencies
> - to use "Big Fat Wheelhouse" support of Pyspark, you need to turn his script 
> into a standard Python package:
> -- write a {{requirements.txt}}. I recommend to specify all package version. 
> You can use [pip-tools|https://github.com/nvie/pip-tools] to maintain the 
> requirements.txt
> {code}
> astroid==1.4.6# via pylint
> autopep8==1.2.4
> click==6.6# via pip-tools
> colorama==0.3.7   # via pylint
> enum34==1.1.6 # via hypothesis
> findspark==1.0.0  # via spark-testing-base
> first==2.0.1  # via pip-tools
> hypothesis==3.4.0 # via spark-testing-base
> lazy-object-proxy==1.2.2  # via astroid
> linecache2==1.0.0 # via traceback2
> pbr==1.10.0
> pep8==1.7.0   # via autopep8
> pip-tools==1.6.5
> py==1.4.31# via pytest
> pyflakes==1.2.3
> pylint==1.5.6
> pytest==2.9.2 # via spark-testing-base
> six==1.10.0   # via astroid, pip-tools, pylint, unittest2
> spark-testing-base==0.0.7.post2
> traceback2==1.4.0 # via unittest2
> unittest2==1.1.0  # via spark-testing-base
> wheel==0.29.0
> wrapt==1.10.8 # via astroid
> {code}
> -- write a setup.py with some entry points or package. Use 
> [PBR|http://docs.openstack.org/developer/pbr/] it makes the jobs of 
> maitaining a setup.py files really easy
> -- create a virtualenv if not already in one:
> {code}
> virtualenv env
> {code}
> -- Work on your environment, define the requirement you need in 
> {{requirements.txt}}, do all the {{pip install}} you need.
> - create the wheelhouse for your current project
> {code}
> pip install wheelhouse
> pip wheel . --wheel-dir wheelhouse
> {code}
> This can take some times, but at the end you have all the .whl required *for 
> your current system*
> - zip it into a {{wheelhouse.zip}}.
> Note that you can have your own package (for instance 

[jira] [Commented] (SPARK-16367) Wheelhouse Support for PySpark

2016-07-05 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363513#comment-15363513
 ] 

Jeff Zhang commented on SPARK-16367:


Oh, happen to find this project to build local python package repository. Maybe 
user can use that http://doc.devpi.net/latest/


> Wheelhouse Support for PySpark
> --
>
> Key: SPARK-16367
> URL: https://issues.apache.org/jira/browse/SPARK-16367
> Project: Spark
>  Issue Type: New Feature
>  Components: Deploy, PySpark
>Affects Versions: 1.6.1, 1.6.2, 2.0.0
>Reporter: Semet
>  Labels: newbie, python, python-wheel, wheelhouse
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> *Rational*
> Is it recommended, in order to deploying Scala packages written in Scala, to 
> build big fat jar files. This allows to have all dependencies on one package 
> so the only "cost" is copy time to deploy this file on every Spark Node.
> On the other hand, Python deployment is more difficult once you want to use 
> external packages, and you don't really want to mess with the IT to deploy 
> the packages on the virtualenv of each nodes.
> *Previous approaches*
> I based the current proposal over the two following bugs related to this 
> point:
> - SPARK-6764 ("Wheel support for PySpark")
> - SPARK-13587("Support virtualenv in PySpark")
> First part of my proposal was to merge, in order to support wheels install 
> and virtualenv creation
> *Uber Fat Wheelhouse for Python Deployment*
> In Python, the packaging standard is now "wheels", which goes further that 
> old good ".egg" files. With a wheel file (".whl"), the package is already 
> prepared for a given architecture. You can have several wheel, each specific 
> to an architecture, or environment. 
> The {{pip}} tools now how to select the package matching the current system, 
> how to install this package in a light speed. Said otherwise, package that 
> requires compilation of a C module, for instance, does *not* compile anything 
> when installing from wheel file.
> {{pip}} also provides the ability to generate easily all wheel of all 
> packages used for a given module (inside a "virtualenv"). This is called 
> "wheelhouse". You can even don't mess with this compilation and retrieve it 
> directly from pypi.python.org.
> *Developer workflow*
> Here is, in a more concrete way, my proposal for on Pyspark developers point 
> of view:
> - you are writing a PySpark script that increase in term of size and 
> dependencies. Deploying on Spark for example requires to build numpy or 
> Theano and other dependencies
> - to use "Big Fat Wheelhouse" support of Pyspark, you need to turn his script 
> into a standard Python package:
> -- write a {{requirements.txt}}. I recommend to specify all package version. 
> You can use [pip-tools|https://github.com/nvie/pip-tools] to maintain the 
> requirements.txt
> {code}
> astroid==1.4.6# via pylint
> autopep8==1.2.4
> click==6.6# via pip-tools
> colorama==0.3.7   # via pylint
> enum34==1.1.6 # via hypothesis
> findspark==1.0.0  # via spark-testing-base
> first==2.0.1  # via pip-tools
> hypothesis==3.4.0 # via spark-testing-base
> lazy-object-proxy==1.2.2  # via astroid
> linecache2==1.0.0 # via traceback2
> pbr==1.10.0
> pep8==1.7.0   # via autopep8
> pip-tools==1.6.5
> py==1.4.31# via pytest
> pyflakes==1.2.3
> pylint==1.5.6
> pytest==2.9.2 # via spark-testing-base
> six==1.10.0   # via astroid, pip-tools, pylint, unittest2
> spark-testing-base==0.0.7.post2
> traceback2==1.4.0 # via unittest2
> unittest2==1.1.0  # via spark-testing-base
> wheel==0.29.0
> wrapt==1.10.8 # via astroid
> {code}
> -- write a setup.py with some entry points or package. Use 
> [PBR|http://docs.openstack.org/developer/pbr/] it makes the jobs of 
> maitaining a setup.py files really easy
> -- create a virtualenv if not already in one:
> {code}
> virtualenv env
> {code}
> -- Work on your environment, define the requirement you need in 
> {{requirements.txt}}, do all the {{pip install}} you need.
> - create the wheelhouse for your current project
> {code}
> pip install wheelhouse
> pip wheel . --wheel-dir wheelhouse
> {code}
> This can take some times, but at the end you have all the .whl required *for 
> your current system*
> - zip it into a {{wheelhouse.zip}}.
> Note that you can have your own package (for instance 'my_package') be 
> generated into a wheel and so installed by {{pip}} automatically.
> Now comes the time to submit the project:
> {code}
> bin/spark-submit  --master master --deploy-mode client --files 
> /path/to/virtualenv/requirements.txt,/path/to/virtualenv/wheelhouse.zip 
> --conf "spark.pyspark.virtualenv.enabled=true" 

[jira] [Resolved] (SPARK-16348) pyspark.ml MLSerDe should be called using full classpath

2016-07-05 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-16348.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 14023
[https://github.com/apache/spark/pull/14023]

> pyspark.ml MLSerDe should be called using full classpath
> 
>
> Key: SPARK-16348
> URL: https://issues.apache.org/jira/browse/SPARK-16348
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Critical
> Fix For: 2.0.0
>
>
> Depending on how Spark is set up, pyspark.ml may or may not be able to find 
> the MLSerDe instance when referenced as {{sc._jvm.MLSerDe}}.  This can cause 
> failures {{'JavaPackage' object is not callable}} when trying to access 
> Vector or Matrix values from pyspark, such as retrieving the coefficients of 
> a LinearRegressionModel.
> Proposal: Whenever we reference a class in the _jvm from pyspark, we should 
> use the full classpath: {{sc._jvm.org.apache.spark.ml.python.MLSerDe}}.  This 
> fixes the bug in my case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16385) NoSuchMethodException thrown by Utils.waitForProcess

2016-07-05 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-16385.

   Resolution: Fixed
 Assignee: Marcelo Vanzin
Fix Version/s: 2.0.0

> NoSuchMethodException thrown by Utils.waitForProcess
> 
>
> Key: SPARK-16385
> URL: https://issues.apache.org/jira/browse/SPARK-16385
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 2.0.0
>
>
> The code in Utils.waitForProcess catches the wrong exception: when using 
> reflection, {{NoSuchMethodException}} is thrown, but the code catches 
> {{NoSuchMethodError}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16383) Remove `SessionState.executeSql`

2016-07-05 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-16383.
-
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 2.1.0

> Remove `SessionState.executeSql`
> 
>
> Key: SPARK-16383
> URL: https://issues.apache.org/jira/browse/SPARK-16383
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.1.0
>
>
> This issue removes `SessionState.executeSql` in favor of `SparkSession.sql`.
> We can remove this safely since the visibility `SessionState` is 
> `private[sql]` and `executeSql` is only used in one **ignored** test, 
> `test("Multiple Hive Instances")`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16359) unidoc workaround for multiple kafka versions

2016-07-05 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-16359.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 14041
[https://github.com/apache/spark/pull/14041]

> unidoc workaround for multiple kafka versions
> -
>
> Key: SPARK-16359
> URL: https://issues.apache.org/jira/browse/SPARK-16359
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Cody Koeninger
> Fix For: 2.0.0
>
>
> sbt unidoc plugin uses dependencyClasspath.all
> Because of this, having both kafka 0.8 and 0.10 dependencies on the classpath 
> causes compilation errors during unidoc.
> Need a workaround, possibly to skip 0.10 during unidoc and then try to add it 
> back later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15730) [Spark SQL] the value of 'hiveconf' parameter in Spark-sql CLI don't take effect in spark-sql session

2016-07-05 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15730.
-
   Resolution: Fixed
 Assignee: Cheng Hao
Fix Version/s: 2.0.0

> [Spark SQL] the value of 'hiveconf' parameter in Spark-sql CLI don't take 
> effect in spark-sql session
> -
>
> Key: SPARK-15730
> URL: https://issues.apache.org/jira/browse/SPARK-15730
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Yi Zhou
>Assignee: Cheng Hao
>Priority: Critical
> Fix For: 2.0.0
>
>
> {noformat}
> /usr/lib/spark/bin/spark-sql -v --driver-memory 4g --executor-memory 7g 
> --executor-cores 5 --num-executors 31 --master yarn-client --conf 
> spark.yarn.executor.memoryOverhead=1024 --hiveconf RESULT_TABLE=test_result01
> spark-sql> use test;
> 16/06/02 21:36:15 INFO execution.SparkSqlParser: Parsing command: use test
> 16/06/02 21:36:15 INFO spark.SparkContext: Starting job: processCmd at 
> CliDriver.java:376
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Got job 2 (processCmd at 
> CliDriver.java:376) with 1 output partitions
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 
> (processCmd at CliDriver.java:376)
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Parents of final stage: List()
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Missing parents: List()
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Submitting ResultStage 2 
> (MapPartitionsRDD[8] at processCmd at CliDriver.java:376), which has no 
> missing parents
> 16/06/02 21:36:15 INFO memory.MemoryStore: Block broadcast_2 stored as values 
> in memory (estimated size 3.2 KB, free 2.4 GB)
> 16/06/02 21:36:15 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as 
> bytes in memory (estimated size 1964.0 B, free 2.4 GB)
> 16/06/02 21:36:15 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in 
> memory on 192.168.3.11:36189 (size: 1964.0 B, free: 2.4 GB)
> 16/06/02 21:36:15 INFO spark.SparkContext: Created broadcast 2 from broadcast 
> at DAGScheduler.scala:1012
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Submitting 1 missing tasks 
> from ResultStage 2 (MapPartitionsRDD[8] at processCmd at CliDriver.java:376)
> 16/06/02 21:36:15 INFO cluster.YarnScheduler: Adding task set 2.0 with 1 tasks
> 16/06/02 21:36:15 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
> 2.0 (TID 2, 192.168.3.13, partition 0, PROCESS_LOCAL, 5362 bytes)
> 16/06/02 21:36:15 INFO cluster.YarnClientSchedulerBackend: Launching task 2 
> on executor id: 10 hostname: 192.168.3.13.
> 16/06/02 21:36:16 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in 
> memory on hw-node3:45924 (size: 1964.0 B, free: 4.4 GB)
> 16/06/02 21:36:17 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
> 2.0 (TID 2) in 1934 ms on 192.168.3.13 (1/1)
> 16/06/02 21:36:17 INFO cluster.YarnScheduler: Removed TaskSet 2.0, whose 
> tasks have all completed, from pool
> 16/06/02 21:36:17 INFO scheduler.DAGScheduler: ResultStage 2 (processCmd at 
> CliDriver.java:376) finished in 1.937 s
> 16/06/02 21:36:17 INFO scheduler.DAGScheduler: Job 2 finished: processCmd at 
> CliDriver.java:376, took 1.962631 s
> Time taken: 2.027 seconds
> 16/06/02 21:36:17 INFO CliDriver: Time taken: 2.027 seconds
> spark-sql> DROP TABLE IF EXISTS ${hiveconf:RESULT_TABLE};
> 16/06/02 21:36:36 INFO execution.SparkSqlParser: Parsing command: DROP TABLE 
> IF EXISTS ${hiveconf:RESULT_TABLE}
> Error in query:
> mismatched input '$' expecting {'ADD', 'AS', 'ALL', 'GROUP', 'BY', 
> 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'LIMIT', 'AT', 'IN', 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'OUTER', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 
> 'RANGE', 'ROWS', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 
> 'VALUES', 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 
> 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'TO', 
> 'TABLESAMPLE', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 
> 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'IF', 
> 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 
> 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 
> 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 
> 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'EXTENDED', 
> 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 
> 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 
> 'STORED', 'DIRECTORIES', 

[jira] [Commented] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.

2016-07-05 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363482#comment-15363482
 ] 

Dongjoon Hyun commented on SPARK-16340:
---

Please check if the PR is what you want. :)
BTW, it will be 2.1.0 in case of merging.

> In regexp_replace function column and/or column expression should also 
> allowed as replacement.
> --
>
> Key: SPARK-16340
> URL: https://issues.apache.org/jira/browse/SPARK-16340
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Mukul Garg
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, regexp_replace function only take string argument as replacement, 
> but in Hive it also accept any column or column expression. It also works in 
> Spark, but as a query. Need to create a overload of this function which also 
> accept Column as replacement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.

2016-07-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363478#comment-15363478
 ] 

Apache Spark commented on SPARK-16340:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/14060

> In regexp_replace function column and/or column expression should also 
> allowed as replacement.
> --
>
> Key: SPARK-16340
> URL: https://issues.apache.org/jira/browse/SPARK-16340
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Mukul Garg
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, regexp_replace function only take string argument as replacement, 
> but in Hive it also accept any column or column expression. It also works in 
> Spark, but as a query. Need to create a overload of this function which also 
> accept Column as replacement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.

2016-07-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16340:


Assignee: (was: Apache Spark)

> In regexp_replace function column and/or column expression should also 
> allowed as replacement.
> --
>
> Key: SPARK-16340
> URL: https://issues.apache.org/jira/browse/SPARK-16340
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Mukul Garg
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, regexp_replace function only take string argument as replacement, 
> but in Hive it also accept any column or column expression. It also works in 
> Spark, but as a query. Need to create a overload of this function which also 
> accept Column as replacement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.

2016-07-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16340:


Assignee: Apache Spark

> In regexp_replace function column and/or column expression should also 
> allowed as replacement.
> --
>
> Key: SPARK-16340
> URL: https://issues.apache.org/jira/browse/SPARK-16340
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Mukul Garg
>Assignee: Apache Spark
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, regexp_replace function only take string argument as replacement, 
> but in Hive it also accept any column or column expression. It also works in 
> Spark, but as a query. Need to create a overload of this function which also 
> accept Column as replacement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16340) In regexp_replace function column and/or column expression should also allowed as replacement.

2016-07-05 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363469#comment-15363469
 ] 

Dongjoon Hyun commented on SPARK-16340:
---

Hi, [~mukul.ga...@gmail.com].
I'll make a PR for this issue.

> In regexp_replace function column and/or column expression should also 
> allowed as replacement.
> --
>
> Key: SPARK-16340
> URL: https://issues.apache.org/jira/browse/SPARK-16340
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Mukul Garg
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, regexp_replace function only take string argument as replacement, 
> but in Hive it also accept any column or column expression. It also works in 
> Spark, but as a query. Need to create a overload of this function which also 
> accept Column as replacement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16387) Reserved SQL words are not escaped by JDBC writer

2016-07-05 Thread Lev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lev updated SPARK-16387:

Summary: Reserved SQL words are not escaped by JDBC writer  (was: Reserved 
words are not escaped for JDBC writer)

> Reserved SQL words are not escaped by JDBC writer
> -
>
> Key: SPARK-16387
> URL: https://issues.apache.org/jira/browse/SPARK-16387
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Lev
>
> Here is a code (imports are omitted)
> object Main extends App {
>   val sqlSession = SparkSession.builder().config(new SparkConf().
> setAppName("Sql Test").set("spark.app.id", "SQLTest").
> set("spark.master", "local[2]").
> set("spark.ui.enabled", "false")
> .setJars(Seq("/mysql/mysql-connector-java-5.1.38.jar" ))
>   ).getOrCreate()
>   import sqlSession.implicits._
>   val localprops = new Properties
>   localprops.put("user", "")
>   localprops.put("password", "")
>   val df = sqlSession.createDataset(Seq("a","b","c")).toDF("order")
>   val writer = df.write
>   .mode(SaveMode.Append)
>   writer
>   .jdbc("jdbc:mysql://localhost:3306/test3", s"jira_test", localprops)
> }
> End error is :
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error 
> in your SQL syntax; check the manual that corresponds to your MySQL server 
> version for the right syntax to use near 'order TEXT )' at line 1
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> Clearly the reserved word  has to be quoted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16387) Reserved words are not escaped for JDBC writer

2016-07-05 Thread Lev (JIRA)
Lev created SPARK-16387:
---

 Summary: Reserved words are not escaped for JDBC writer
 Key: SPARK-16387
 URL: https://issues.apache.org/jira/browse/SPARK-16387
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Lev


Here is a code (imports are omitted)
object Main extends App {
  val sqlSession = SparkSession.builder().config(new SparkConf().
setAppName("Sql Test").set("spark.app.id", "SQLTest").
set("spark.master", "local[2]").
set("spark.ui.enabled", "false")
.setJars(Seq("/mysql/mysql-connector-java-5.1.38.jar" ))
  ).getOrCreate()

  import sqlSession.implicits._

  val localprops = new Properties
  localprops.put("user", "")
  localprops.put("password", "")

  val df = sqlSession.createDataset(Seq("a","b","c")).toDF("order")
  val writer = df.write
  .mode(SaveMode.Append)
  writer
  .jdbc("jdbc:mysql://localhost:3306/test3", s"jira_test", localprops)
}


End error is :
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in 
your SQL syntax; check the manual that corresponds to your MySQL server version 
for the right syntax to use near 'order TEXT )' at line 1
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

Clearly the reserved word  has to be quoted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16377) Spark MLlib: MultilayerPerceptronClassifier - error while training

2016-07-05 Thread Mikhail Shiryaev (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363458#comment-15363458
 ] 

Mikhail Shiryaev commented on SPARK-16377:
--

The original issue with ArrayIndexOutOfBoundsException was due to bug in my 
code (inconsistency between layers and real features count).
And issue with "ERROR StrongWolfeLineSearch" isn't reproducible yet.
Sorry for taking your time and thank you for quick responses.

> Spark MLlib: MultilayerPerceptronClassifier - error while training
> --
>
> Key: SPARK-16377
> URL: https://issues.apache.org/jira/browse/SPARK-16377
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLilb
>Affects Versions: 1.5.2
>Reporter: Mikhail Shiryaev
>
> Hi, 
> I am trying to train model by MultilayerPerceptronClassifier. 
> It works on sample data from 
> data/mllib/sample_multiclass_classification_data.txt with 4 features, 3 
> classes and layers [4, 4, 3]. 
> But when I try to use other input files with other features and classes (from 
> here for example: 
> https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html) 
> then I get errors. 
> Example: 
> Input file aloi (128 features, 1000 classes, layers [128, 128, 1000]): 
> with block size = 1: 
> ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. 
> Decreasing step size to Infinity 
> ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: 
> Line search failed 
> ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is 
> just poorly behaved? 
> with default block size = 128: 
>  java.lang.ArrayIndexOutOfBoundsException 
>   at java.lang.System.arraycopy(Native Method) 
>   at 
> org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:629)
>  
>   at 
> org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:628)
>  
>at scala.collection.immutable.List.foreach(List.scala:381) 
>at 
> org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:628)
>  
>at 
> org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:624)
>  
> Even if I modify sample_multiclass_classification_data.txt file (rename all 
> 4-th features to 5-th) and run with layers [5, 5, 3] then I also get the same 
> errors as for file above. 
> So to resume: 
> I can't run training with default block size and with more than 4 features. 
> If I set  block size to 1 then some actions are happened but I get errors 
> from LBFGS. 
> It is reproducible with Spark 1.5.2 and from master branch on github (from 
> 4-th July). 
> Did somebody already met with such behavior? 
> Is there bug in MultilayerPerceptronClassifier or I use it incorrectly? 
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16240) model loading backward compatibility for ml.clustering.LDA

2016-07-05 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363454#comment-15363454
 ] 

Joseph K. Bradley commented on SPARK-16240:
---

+1 for adding special logic in the loading code.  That's the general plan for 
handling backwards compatibility when the model storage format changes.  Thanks!

> model loading backward compatibility for ml.clustering.LDA
> --
>
> Key: SPARK-16240
> URL: https://issues.apache.org/jira/browse/SPARK-16240
> Project: Spark
>  Issue Type: Bug
>Reporter: yuhao yang
>Priority: Minor
>
> After resolving the matrix conversion issue, LDA model still cannot load 1.6 
> models as one of the parameter name is changed.
> https://github.com/apache/spark/pull/12065
> We can perhaps add some special logic in the loading code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16334) [SQL] SQL query on parquet table java.lang.ArrayIndexOutOfBoundsException

2016-07-05 Thread Vladimir Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363438#comment-15363438
 ] 

Vladimir Ivanov edited comment on SPARK-16334 at 7/5/16 11:13 PM:
--

Hi, we discovered problem with the same stacktrace in Spark 2.0. In our case 
it's thrown during DataFrame.rdd call. Moreover it somehow depends on volume of 
data, because it is not thrown when we change filter criteria accordingly. We 
used SparkSQL to write these parquet files and didn't explicitly specify 
WriterVersion option so I believe whatever version is set by default was used.


was (Author: vivanov):
Hi, we discovered problem with the same stacktrace in Spark 2.0. In our case 
it's thrown during {code}DataFrame.rdd{code} call. Moreover it somehow depends 
on volume of data, because it is not thrown when we change filter criteria 
accordingly. We used SparkSQL to write these parquet files and didn't 
explicitly specify {code}WriterVersion{code} option so I believe whatever 
version is set by default was used.

> [SQL] SQL query on parquet table java.lang.ArrayIndexOutOfBoundsException
> -
>
> Key: SPARK-16334
> URL: https://issues.apache.org/jira/browse/SPARK-16334
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Egor Pahomov
>Priority: Critical
>  Labels: sql
>
> Query:
> {code}
> select * from blabla where user_id = 415706251
> {code}
> Error:
> {code}
> 16/06/30 14:07:27 WARN scheduler.TaskSetManager: Lost task 11.0 in stage 0.0 
> (TID 3, hadoop6): java.lang.ArrayIndexOutOfBoundsException: 6934
> at 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.decodeToBinary(PlainValuesDictionary.java:119)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.decodeDictionaryIds(VectorizedColumnReader.java:273)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:170)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:230)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137)
> at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:36)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:780)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:780)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Work on 1.6.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16334) [SQL] SQL query on parquet table java.lang.ArrayIndexOutOfBoundsException

2016-07-05 Thread Vladimir Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363438#comment-15363438
 ] 

Vladimir Ivanov commented on SPARK-16334:
-

Hi, we discovered problem with the same stacktrace in Spark 2.0. In our case 
it's thrown during {code}DataFrame.rdd{code} call. Moreover it somehow depends 
on volume of data, because it is not thrown when we change filter criteria 
accordingly. We used SparkSQL to write these parquet files and didn't 
explicitly specify {code}WriterVersion{code} option so I believe whatever 
version is set by default was used.

> [SQL] SQL query on parquet table java.lang.ArrayIndexOutOfBoundsException
> -
>
> Key: SPARK-16334
> URL: https://issues.apache.org/jira/browse/SPARK-16334
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Egor Pahomov
>Priority: Critical
>  Labels: sql
>
> Query:
> {code}
> select * from blabla where user_id = 415706251
> {code}
> Error:
> {code}
> 16/06/30 14:07:27 WARN scheduler.TaskSetManager: Lost task 11.0 in stage 0.0 
> (TID 3, hadoop6): java.lang.ArrayIndexOutOfBoundsException: 6934
> at 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.decodeToBinary(PlainValuesDictionary.java:119)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.decodeDictionaryIds(VectorizedColumnReader.java:273)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:170)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:230)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137)
> at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:36)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:780)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:780)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Work on 1.6.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16384) FROM_UNIXTIME reports incorrect days

2016-07-05 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363420#comment-15363420
 ] 

Dongjoon Hyun commented on SPARK-16384:
---

It's the behavior of Java `SimpleDateFormat`. Hive also returns the same result.

> FROM_UNIXTIME reports incorrect days
> 
>
> Key: SPARK-16384
> URL: https://issues.apache.org/jira/browse/SPARK-16384
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 1.6.2, 2.0.0
>Reporter: Frank Stratton
>
> Timestamps between 2015-12-27 and 2015-12-31 are reported in the incorrect 
> year (2016-12-*):
> {quote}
> results = sqlContext.sql("SELECT FROM_UNIXTIME(1451088000, '-MM-dd')")   
> # 2015-12-26
> print results.collect()
> results = sqlContext.sql("SELECT FROM_UNIXTIME(1451174400, '-MM-dd')")   
> # 2015-12-27
> print results.collect()
> results = sqlContext.sql("SELECT FROM_UNIXTIME(1451260800, '-MM-dd')")   
> # 2015-12-28
> print results.collect()
> results = sqlContext.sql("SELECT FROM_UNIXTIME(1451347200, '-MM-dd')")   
> # 2015-12-29
> print results.collect()
> {quote}
> outputs:
> {quote}
> [Row(_c0=u'2015-12-26')]
> [Row(_c0=u'2016-12-27')]
> [Row(_c0=u'2016-12-28')]
> [Row(_c0=u'2016-12-29')]
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16384) FROM_UNIXTIME reports incorrect days

2016-07-05 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363418#comment-15363418
 ] 

Dongjoon Hyun commented on SPARK-16384:
---

Hi, [~epanastasi]

That is not a bug. You should use '' instead of ''.

{code}
scala> sql("SELECT FROM_UNIXTIME(1451260800, '-MM-dd')").collect
res1: Array[org.apache.spark.sql.Row] = Array([2016-12-27])

scala> sql("SELECT FROM_UNIXTIME(1451260800, '-MM-dd')").collect
res2: Array[org.apache.spark.sql.Row] = Array([2015-12-27])
{code}

> FROM_UNIXTIME reports incorrect days
> 
>
> Key: SPARK-16384
> URL: https://issues.apache.org/jira/browse/SPARK-16384
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 1.6.2, 2.0.0
>Reporter: Frank Stratton
>
> Timestamps between 2015-12-27 and 2015-12-31 are reported in the incorrect 
> year (2016-12-*):
> {quote}
> results = sqlContext.sql("SELECT FROM_UNIXTIME(1451088000, '-MM-dd')")   
> # 2015-12-26
> print results.collect()
> results = sqlContext.sql("SELECT FROM_UNIXTIME(1451174400, '-MM-dd')")   
> # 2015-12-27
> print results.collect()
> results = sqlContext.sql("SELECT FROM_UNIXTIME(1451260800, '-MM-dd')")   
> # 2015-12-28
> print results.collect()
> results = sqlContext.sql("SELECT FROM_UNIXTIME(1451347200, '-MM-dd')")   
> # 2015-12-29
> print results.collect()
> {quote}
> outputs:
> {quote}
> [Row(_c0=u'2015-12-26')]
> [Row(_c0=u'2016-12-27')]
> [Row(_c0=u'2016-12-28')]
> [Row(_c0=u'2016-12-29')]
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363410#comment-15363410
 ] 

Michael Gummelt commented on SPARK-11857:
-

Thanks!

> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16386) SQLContext and HiveContext parse a query string differently

2016-07-05 Thread Hao Ren (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Ren updated SPARK-16386:

Description: 
I just want to figure out why the two contexts behavior differently even on a 
simple query.
In a netshell, I have a query in which there is a String containing single 
quote and casting to Array/Map.
I have tried all the combination of diff type of sql context and query call api 
(sql, df.select, df.selectExpr).
I can't find one rules all.

Here is the code for reproducing the problem.
{code}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}

object Test extends App {

  val sc  = new SparkContext("local[2]", "test", new SparkConf)
  val hiveContext = new HiveContext(sc)
  val sqlContext  = new SQLContext(sc)

  val context = hiveContext
  //  val context = sqlContext

  import context.implicits._

  val df = Seq((Seq(1, 2), 2)).toDF("a", "b")
  df.registerTempTable("tbl")
  df.printSchema()

  // case 1
  context.sql("select cast(a as array) from tbl").show()
  // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize 
input near 'array' '<' 'string' in primitive type specification; line 1 pos 17
  // SQLContext => OK

  // case 2
  context.sql("select 'a\\'b'").show()
  // HiveContext => OK
  // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string 
literal) found

  // case 3
  df.selectExpr("cast(a as array)").show() // OK with HiveContext and 
SQLContext

  // case 4
  df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of 
input expected
}
{code}

  was:
I just want to figure out why the two contexts behavior differently even on a 
simple query.
In a netshell, I have a query in which there is a String containing single 
quote and casting to Array/Map.
I have tried all the combination of diff type of sql context and query call api 
(sql, df.select, df.selectExpr).
I can't find one rules all.

Here is the code for reproducing the problem.
{code: java}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}

object Test extends App {

  val sc  = new SparkContext("local[2]", "test", new SparkConf)
  val hiveContext = new HiveContext(sc)
  val sqlContext  = new SQLContext(sc)

  val context = hiveContext
  //  val context = sqlContext

  import context.implicits._

  val df = Seq((Seq(1, 2), 2)).toDF("a", "b")
  df.registerTempTable("tbl")
  df.printSchema()

  // case 1
  context.sql("select cast(a as array) from tbl").show()
  // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize 
input near 'array' '<' 'string' in primitive type specification; line 1 pos 17
  // SQLContext => OK

  // case 2
  context.sql("select 'a\\'b'").show()
  // HiveContext => OK
  // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string 
literal) found

  // case 3
  df.selectExpr("cast(a as array)").show() // OK with HiveContext and 
SQLContext

  // case 4
  df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of 
input expected
}
{code}


> SQLContext and HiveContext parse a query string differently
> ---
>
> Key: SPARK-16386
> URL: https://issues.apache.org/jira/browse/SPARK-16386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1, 1.6.2
> Environment: scala 2.10, 2.11
>Reporter: Hao Ren
>  Labels: patch
>
> I just want to figure out why the two contexts behavior differently even on a 
> simple query.
> In a netshell, I have a query in which there is a String containing single 
> quote and casting to Array/Map.
> I have tried all the combination of diff type of sql context and query call 
> api (sql, df.select, df.selectExpr).
> I can't find one rules all.
> Here is the code for reproducing the problem.
> {code}
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.hive.HiveContext
> import org.apache.spark.{SparkConf, SparkContext}
> object Test extends App {
>   val sc  = new SparkContext("local[2]", "test", new SparkConf)
>   val hiveContext = new HiveContext(sc)
>   val sqlContext  = new SQLContext(sc)
>   val context = hiveContext
>   //  val context = sqlContext
>   import context.implicits._
>   val df = Seq((Seq(1, 2), 2)).toDF("a", "b")
>   df.registerTempTable("tbl")
>   df.printSchema()
>   // case 1
>   context.sql("select cast(a as array) from tbl").show()
>   // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize 
> input near 'array' '<' 'string' in primitive type specification; line 1 pos 17
>   // SQLContext => OK
>   // case 2
>   context.sql("select 'a\\'b'").show()
>   // 

[jira] [Updated] (SPARK-16386) SQLContext and HiveContext parse a query string differently

2016-07-05 Thread Hao Ren (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Ren updated SPARK-16386:

Description: 
I just want to figure out why the two contexts behavior differently even on a 
simple query.
In a netshell, I have a query in which there is a String containing single 
quote and casting to Array/Map.
I have tried all the combination of diff type of sql context and query call api 
(sql, df.select, df.selectExpr).
I can't find one rules all.

Here is the code for reproducing the problem.
{code: java}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}

object Test extends App {

  val sc  = new SparkContext("local[2]", "test", new SparkConf)
  val hiveContext = new HiveContext(sc)
  val sqlContext  = new SQLContext(sc)

  val context = hiveContext
  //  val context = sqlContext

  import context.implicits._

  val df = Seq((Seq(1, 2), 2)).toDF("a", "b")
  df.registerTempTable("tbl")
  df.printSchema()

  // case 1
  context.sql("select cast(a as array) from tbl").show()
  // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize 
input near 'array' '<' 'string' in primitive type specification; line 1 pos 17
  // SQLContext => OK

  // case 2
  context.sql("select 'a\\'b'").show()
  // HiveContext => OK
  // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string 
literal) found

  // case 3
  df.selectExpr("cast(a as array)").show() // OK with HiveContext and 
SQLContext

  // case 4
  df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of 
input expected
}
{code}

  was:
I just want to figure out why the two contexts behavior differently even on a 
simple query.
In a netshell, I have a query in which there is a String containing single 
quote and casting to Array/Map.
I have tried all the combination of diff type of sql context and query call api 
(sql, df.select, df.selectExpr).
I can't find one rules all.

Here is the code for reproducing the problem.
{code: javaj}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}

object Test extends App {

  val sc  = new SparkContext("local[2]", "test", new SparkConf)
  val hiveContext = new HiveContext(sc)
  val sqlContext  = new SQLContext(sc)

  val context = hiveContext
  //  val context = sqlContext

  import context.implicits._

  val df = Seq((Seq(1, 2), 2)).toDF("a", "b")
  df.registerTempTable("tbl")
  df.printSchema()

  // case 1
  context.sql("select cast(a as array) from tbl").show()
  // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize 
input near 'array' '<' 'string' in primitive type specification; line 1 pos 17
  // SQLContext => OK

  // case 2
  context.sql("select 'a\\'b'").show()
  // HiveContext => OK
  // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string 
literal) found

  // case 3
  df.selectExpr("cast(a as array)").show() // OK with HiveContext and 
SQLContext

  // case 4
  df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of 
input expected
}
{code}


> SQLContext and HiveContext parse a query string differently
> ---
>
> Key: SPARK-16386
> URL: https://issues.apache.org/jira/browse/SPARK-16386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1, 1.6.2
> Environment: scala 2.10, 2.11
>Reporter: Hao Ren
>  Labels: patch
>
> I just want to figure out why the two contexts behavior differently even on a 
> simple query.
> In a netshell, I have a query in which there is a String containing single 
> quote and casting to Array/Map.
> I have tried all the combination of diff type of sql context and query call 
> api (sql, df.select, df.selectExpr).
> I can't find one rules all.
> Here is the code for reproducing the problem.
> {code: java}
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.hive.HiveContext
> import org.apache.spark.{SparkConf, SparkContext}
> object Test extends App {
>   val sc  = new SparkContext("local[2]", "test", new SparkConf)
>   val hiveContext = new HiveContext(sc)
>   val sqlContext  = new SQLContext(sc)
>   val context = hiveContext
>   //  val context = sqlContext
>   import context.implicits._
>   val df = Seq((Seq(1, 2), 2)).toDF("a", "b")
>   df.registerTempTable("tbl")
>   df.printSchema()
>   // case 1
>   context.sql("select cast(a as array) from tbl").show()
>   // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize 
> input near 'array' '<' 'string' in primitive type specification; line 1 pos 17
>   // SQLContext => OK
>   // case 2
>   context.sql("select 'a\\'b'").show()
> 

[jira] [Created] (SPARK-16386) SQLContext and HiveContext parse a query string differently

2016-07-05 Thread Hao Ren (JIRA)
Hao Ren created SPARK-16386:
---

 Summary: SQLContext and HiveContext parse a query string 
differently
 Key: SPARK-16386
 URL: https://issues.apache.org/jira/browse/SPARK-16386
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.2, 1.6.1, 1.6.0
 Environment: scala 2.10, 2.11
Reporter: Hao Ren


I just want to figure out why the two contexts behavior differently even on a 
simple query.
In a netshell, I have a query in which there is a String containing single 
quote and casting to Array/Map.
I have tried all the combination of diff type of sql context and query call api 
(sql, df.select, df.selectExpr).
I can't find one rules all.

Here is the code for reproducing the problem.
{code: scala}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}

object Test extends App {

  val sc  = new SparkContext("local[2]", "test", new SparkConf)
  val hiveContext = new HiveContext(sc)
  val sqlContext  = new SQLContext(sc)

  val context = hiveContext
  //  val context = sqlContext

  import context.implicits._

  val df = Seq((Seq(1, 2), 2)).toDF("a", "b")
  df.registerTempTable("tbl")
  df.printSchema()

  // case 1
  context.sql("select cast(a as array) from tbl").show()
  // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize 
input near 'array' '<' 'string' in primitive type specification; line 1 pos 17
  // SQLContext => OK

  // case 2
  context.sql("select 'a\\'b'").show()
  // HiveContext => OK
  // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string 
literal) found

  // case 3
  df.selectExpr("cast(a as array)").show() // OK with HiveContext and 
SQLContext

  // case 4
  df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of 
input expected
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16386) SQLContext and HiveContext parse a query string differently

2016-07-05 Thread Hao Ren (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Ren updated SPARK-16386:

Description: 
I just want to figure out why the two contexts behavior differently even on a 
simple query.
In a netshell, I have a query in which there is a String containing single 
quote and casting to Array/Map.
I have tried all the combination of diff type of sql context and query call api 
(sql, df.select, df.selectExpr).
I can't find one rules all.

Here is the code for reproducing the problem.
{code: javaj}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}

object Test extends App {

  val sc  = new SparkContext("local[2]", "test", new SparkConf)
  val hiveContext = new HiveContext(sc)
  val sqlContext  = new SQLContext(sc)

  val context = hiveContext
  //  val context = sqlContext

  import context.implicits._

  val df = Seq((Seq(1, 2), 2)).toDF("a", "b")
  df.registerTempTable("tbl")
  df.printSchema()

  // case 1
  context.sql("select cast(a as array) from tbl").show()
  // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize 
input near 'array' '<' 'string' in primitive type specification; line 1 pos 17
  // SQLContext => OK

  // case 2
  context.sql("select 'a\\'b'").show()
  // HiveContext => OK
  // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string 
literal) found

  // case 3
  df.selectExpr("cast(a as array)").show() // OK with HiveContext and 
SQLContext

  // case 4
  df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of 
input expected
}
{code}

  was:
I just want to figure out why the two contexts behavior differently even on a 
simple query.
In a netshell, I have a query in which there is a String containing single 
quote and casting to Array/Map.
I have tried all the combination of diff type of sql context and query call api 
(sql, df.select, df.selectExpr).
I can't find one rules all.

Here is the code for reproducing the problem.
{code: scala}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}

object Test extends App {

  val sc  = new SparkContext("local[2]", "test", new SparkConf)
  val hiveContext = new HiveContext(sc)
  val sqlContext  = new SQLContext(sc)

  val context = hiveContext
  //  val context = sqlContext

  import context.implicits._

  val df = Seq((Seq(1, 2), 2)).toDF("a", "b")
  df.registerTempTable("tbl")
  df.printSchema()

  // case 1
  context.sql("select cast(a as array) from tbl").show()
  // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize 
input near 'array' '<' 'string' in primitive type specification; line 1 pos 17
  // SQLContext => OK

  // case 2
  context.sql("select 'a\\'b'").show()
  // HiveContext => OK
  // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string 
literal) found

  // case 3
  df.selectExpr("cast(a as array)").show() // OK with HiveContext and 
SQLContext

  // case 4
  df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of 
input expected
}
{code}


> SQLContext and HiveContext parse a query string differently
> ---
>
> Key: SPARK-16386
> URL: https://issues.apache.org/jira/browse/SPARK-16386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1, 1.6.2
> Environment: scala 2.10, 2.11
>Reporter: Hao Ren
>  Labels: patch
>
> I just want to figure out why the two contexts behavior differently even on a 
> simple query.
> In a netshell, I have a query in which there is a String containing single 
> quote and casting to Array/Map.
> I have tried all the combination of diff type of sql context and query call 
> api (sql, df.select, df.selectExpr).
> I can't find one rules all.
> Here is the code for reproducing the problem.
> {code: javaj}
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.hive.HiveContext
> import org.apache.spark.{SparkConf, SparkContext}
> object Test extends App {
>   val sc  = new SparkContext("local[2]", "test", new SparkConf)
>   val hiveContext = new HiveContext(sc)
>   val sqlContext  = new SQLContext(sc)
>   val context = hiveContext
>   //  val context = sqlContext
>   import context.implicits._
>   val df = Seq((Seq(1, 2), 2)).toDF("a", "b")
>   df.registerTempTable("tbl")
>   df.printSchema()
>   // case 1
>   context.sql("select cast(a as array) from tbl").show()
>   // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize 
> input near 'array' '<' 'string' in primitive type specification; line 1 pos 17
>   // SQLContext => OK
>   // case 2
>   context.sql("select 'a\\'b'").show()

[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Adam McElwee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363404#comment-15363404
 ] 

Adam McElwee commented on SPARK-11857:
--

Nah, carry on. I'm not using spark at the new job, so I won't be able to pass 
along any new info.

> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16324) regexp_extract should doc that it returns empty string when match fails

2016-07-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16324:
--
Issue Type: Improvement  (was: Bug)
   Summary: regexp_extract should doc that it returns empty string when 
match fails  (was: regexp_extract returns empty string when match fails)

> regexp_extract should doc that it returns empty string when match fails
> ---
>
> Key: SPARK-16324
> URL: https://issues.apache.org/jira/browse/SPARK-16324
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Max Moroz
>Priority: Minor
>
> The documentation for regexp_extract isn't clear about how it should behave 
> if the regex didn't match the row. However, the Java documentation it refers 
> for further detail suggests that the return value should be null if the group 
> wasn't matched at all, empty string is the group actually matched empty 
> string, and an exception raised if the entire regex didn't match.
> This would be identical to how python's own re module behaves when a 
> MatchObject.group() is called.
> However, in practice regexp_extract() returns empty string when the match 
> fails. This seems to be a bug; if it was intended as a feature, it should 
> have been documented as such - and it was probably not a good idea since it 
> can result in silent bugs.
> {code}
> import pyspark.sql.functions as F
> df = spark.createDataFrame([['abc']], ['text'])
> assert df.select(F.regexp_extract('text', r'(z)', 1)).first()[0] == ''
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15909) PySpark classpath uri incorrectly set

2016-07-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-15909:
--
Priority: Minor  (was: Major)

This is another instance of something that isn't generally supported -- running 
different contexts in one process. That said, it'd be better to explicitly fail 
or else find some change that would make this particular issue go away.

> PySpark classpath uri incorrectly set
> -
>
> Key: SPARK-15909
> URL: https://issues.apache.org/jira/browse/SPARK-15909
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.1
>Reporter: Liam Fisk
>Priority: Minor
>
> PySpark behaves differently if the SparkContext is created within the REPL 
> (vs initialised by the shell).
> My conf/spark-env.sh file contains:
> {code}
> #!/bin/bash
> export SPARK_LOCAL_IP=172.20.30.158
> export LIBPROCESS_IP=172.20.30.158
> export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
> {code}
> And when running pyspark it will correctly initialize my SparkContext. 
> However, when I run:
> {code}
> from pyspark import SparkContext, SparkConf
> sc.stop()
> conf = (
> SparkConf()
> .setMaster("mesos://zk://foo:2181/mesos")
> .setAppName("Jupyter PySpark")
> )
> sc = SparkContext(conf=conf)
> {code}
> my _spark.driver.uri_ and URL classpath will point to localhost (preventing 
> my mesos cluster from accessing the appropriate files)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15804) Manually added metadata not saving with parquet

2016-07-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-15804:
--
Fix Version/s: (was: 2.0.0)

> Manually added metadata not saving with parquet
> ---
>
> Key: SPARK-15804
> URL: https://issues.apache.org/jira/browse/SPARK-15804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Charlie Evans
>Assignee: kevin yu
>
> Adding metadata with col().as(_, metadata) then saving the resultant 
> dataframe does not save the metadata. No error is thrown. Only see the schema 
> contains the metadata before saving and does not contain the metadata after 
> saving and loading the dataframe. Was working fine with 1.6.1.
> {code}
> case class TestRow(a: String, b: Int)
> val rows = TestRow("a", 0) :: TestRow("b", 1) :: TestRow("c", 2) :: Nil
> val df = spark.createDataFrame(rows)
> import org.apache.spark.sql.types.MetadataBuilder
> val md = new MetadataBuilder().putString("key", "value").build()
> val dfWithMeta = df.select(col("a"), col("b").as("b", md))
> println(dfWithMeta.schema.json)
> dfWithMeta.write.parquet("dfWithMeta")
> val dfWithMeta2 = spark.read.parquet("dfWithMeta")
> println(dfWithMeta2.schema.json)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16098) Multiclass SVM Learning

2016-07-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16098.
---
Resolution: Won't Fix

> Multiclass SVM Learning
> ---
>
> Key: SPARK-16098
> URL: https://issues.apache.org/jira/browse/SPARK-16098
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.0.0
> Environment: Spark MLLib and ML 1.6.1
>Reporter: Hayri Volkan Agun
>Priority: Minor
>   Original Estimate: 1,512h
>  Remaining Estimate: 1,512h
>
> There exists a OneVsRest classifier for using all binary classification 
> classifiers in multi-class classification. However for Linear SVM using 
> OneVsRest may create an imbalanced dataset scenarios where SVM of Spark 
> certainly fails. I verified this by creating LinearSVM classifier and 
> implemented predictRaw method of ClassificationModel class. In all 
> experiments the results came very poor in terms of F-Measure. The only 
> explanation is SVM is very sensitive to imbalanced dataset, and naturally 
> OneVsRest classifier creates an imbalanced dataset. 
> For multi-class classification, linear SVM can be optimized by considering 
> imbalanced datasets.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363367#comment-15363367
 ] 

Michael Gummelt commented on SPARK-11857:
-

I'll give [~amcelwee] a couple days to respond.

[~dragos] [~skonto] [~tnachen] speak now or forever hold your peace.

> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15730) [Spark SQL] the value of 'hiveconf' parameter in Spark-sql CLI don't take effect in spark-sql session

2016-07-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363355#comment-15363355
 ] 

Apache Spark commented on SPARK-15730:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/14058

> [Spark SQL] the value of 'hiveconf' parameter in Spark-sql CLI don't take 
> effect in spark-sql session
> -
>
> Key: SPARK-15730
> URL: https://issues.apache.org/jira/browse/SPARK-15730
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Yi Zhou
>Priority: Critical
>
> {noformat}
> /usr/lib/spark/bin/spark-sql -v --driver-memory 4g --executor-memory 7g 
> --executor-cores 5 --num-executors 31 --master yarn-client --conf 
> spark.yarn.executor.memoryOverhead=1024 --hiveconf RESULT_TABLE=test_result01
> spark-sql> use test;
> 16/06/02 21:36:15 INFO execution.SparkSqlParser: Parsing command: use test
> 16/06/02 21:36:15 INFO spark.SparkContext: Starting job: processCmd at 
> CliDriver.java:376
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Got job 2 (processCmd at 
> CliDriver.java:376) with 1 output partitions
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 
> (processCmd at CliDriver.java:376)
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Parents of final stage: List()
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Missing parents: List()
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Submitting ResultStage 2 
> (MapPartitionsRDD[8] at processCmd at CliDriver.java:376), which has no 
> missing parents
> 16/06/02 21:36:15 INFO memory.MemoryStore: Block broadcast_2 stored as values 
> in memory (estimated size 3.2 KB, free 2.4 GB)
> 16/06/02 21:36:15 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as 
> bytes in memory (estimated size 1964.0 B, free 2.4 GB)
> 16/06/02 21:36:15 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in 
> memory on 192.168.3.11:36189 (size: 1964.0 B, free: 2.4 GB)
> 16/06/02 21:36:15 INFO spark.SparkContext: Created broadcast 2 from broadcast 
> at DAGScheduler.scala:1012
> 16/06/02 21:36:15 INFO scheduler.DAGScheduler: Submitting 1 missing tasks 
> from ResultStage 2 (MapPartitionsRDD[8] at processCmd at CliDriver.java:376)
> 16/06/02 21:36:15 INFO cluster.YarnScheduler: Adding task set 2.0 with 1 tasks
> 16/06/02 21:36:15 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
> 2.0 (TID 2, 192.168.3.13, partition 0, PROCESS_LOCAL, 5362 bytes)
> 16/06/02 21:36:15 INFO cluster.YarnClientSchedulerBackend: Launching task 2 
> on executor id: 10 hostname: 192.168.3.13.
> 16/06/02 21:36:16 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in 
> memory on hw-node3:45924 (size: 1964.0 B, free: 4.4 GB)
> 16/06/02 21:36:17 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
> 2.0 (TID 2) in 1934 ms on 192.168.3.13 (1/1)
> 16/06/02 21:36:17 INFO cluster.YarnScheduler: Removed TaskSet 2.0, whose 
> tasks have all completed, from pool
> 16/06/02 21:36:17 INFO scheduler.DAGScheduler: ResultStage 2 (processCmd at 
> CliDriver.java:376) finished in 1.937 s
> 16/06/02 21:36:17 INFO scheduler.DAGScheduler: Job 2 finished: processCmd at 
> CliDriver.java:376, took 1.962631 s
> Time taken: 2.027 seconds
> 16/06/02 21:36:17 INFO CliDriver: Time taken: 2.027 seconds
> spark-sql> DROP TABLE IF EXISTS ${hiveconf:RESULT_TABLE};
> 16/06/02 21:36:36 INFO execution.SparkSqlParser: Parsing command: DROP TABLE 
> IF EXISTS ${hiveconf:RESULT_TABLE}
> Error in query:
> mismatched input '$' expecting {'ADD', 'AS', 'ALL', 'GROUP', 'BY', 
> 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'LIMIT', 'AT', 'IN', 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'OUTER', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 
> 'RANGE', 'ROWS', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 
> 'VALUES', 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 
> 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'TO', 
> 'TABLESAMPLE', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 
> 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'IF', 
> 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 
> 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 
> 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 
> 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'EXTENDED', 
> 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 
> 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 
> 'STORED', 'DIRECTORIES', 'LOCATION', 

[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363345#comment-15363345
 ] 

Michael Gummelt commented on SPARK-11857:
-

For completeness, here's my theoretical analysis to augment our empirical 
observation that users don't mind fine-grained mode being removed.

Fine-grained mode provides two benefits:
1) Slow-start
  Executors are brought up lazily

2) Relinquishing cores
  Cores are relinquished back to Mesos as Spark tasks terminate

Fine-grained mdoe does *not* provide the following benefits, though some think 
it does:
a) Relinquishing memory
  The JVM doesn't relinquish memory, so it would be unsafe for us to resize the 
cgroup

b) Relinquishing executors

As for alternatives to the benefits, 1) is provided by dynamic allocation, 
though we need a better recommended setup for this as I document here: 
http://apache-spark-developers-list.1001551.n3.nabble.com/HDFS-as-Shuffle-Service-td17340.html
There is no alternative to 2), but we've generally found that the 
executor-level granularity of dynamic allocation is sufficient for most. 

> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363331#comment-15363331
 ] 

Reynold Xin commented on SPARK-11857:
-

Michael can you submit a pull request to log deprecation and perhaps also
update the docs?




> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363324#comment-15363324
 ] 

Sean Owen commented on SPARK-16379:
---

Of course we'd all like to never have bugs. Nobody makes bugs on purpose. Bugs 
exist though, and everyone agrees something has to be fixed. This just states 
the obvious and is not actionable. What does that mean for _how_ to fix _this_ 
bug?  I will make a PR in any event since I think this issue is understood by 
now.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Stavros Kontopoulos (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-16379:

Comment: was deleted

(was: > We have a bug and need to address it in the best way we can see.
I  agree. However, how to do that is what we are discussing up to now.)

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363321#comment-15363321
 ] 

Michael Gummelt commented on SPARK-11857:
-

[~amcelwee] Do you have any more input on this issue.  We're moving forward 
with deprecating fine-grained mode, but we're willing to solve your issue first.

> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363302#comment-15363302
 ] 

Stavros Kontopoulos edited comment on SPARK-16379 at 7/5/16 9:49 PM:
-

There was no bug previously in the scheduler. It was working before i guess.
The project was not broken and the best practice is to keep it that way all the 
time. 
I think we can agree on that right?


was (Author: skonto):
There was no bug previously in the scheduler. It was working or not?
The project was not broken and the best practice is to keep it that way all the 
time. 
Otherwise if we dont follow best practices we can do anything we want. I dont 
understand
why we cannot even agree on that?

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363314#comment-15363314
 ] 

Stavros Kontopoulos commented on SPARK-16379:
-

> Nobody is suggesting knowingly making a change that triggers a bug, so I am 
> not sure what this is arguing against in this context.

I am just saying now that we know its an issue, we could revert the commit so 
we can do the improvements next. Its a blocker.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363311#comment-15363311
 ] 

Michael Gummelt commented on SPARK-11857:
-

I endorse the deprecation.  Fine-grained mode would be more useful if the JVM 
could shrink in memory as well as cores, but alas...

We at Mesosphere haven't heard any objections from users regarding the loss of 
fine-grained.

[~andrewor14] Please cc me if you need Mesos input.  Tim is still active, I 
believe, but no longer at Mesosphere.  I work (mostly) full-time on Spark on 
Mesos.

> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt reopened SPARK-11857:
-

> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363305#comment-15363305
 ] 

Stavros Kontopoulos commented on SPARK-16379:
-

> We have a bug and need to address it in the best way we can see.
I  agree. However, how to do that is what we are discussing up to now.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363303#comment-15363303
 ] 

Sean Owen commented on SPARK-16379:
---

It's true, I can't say for sure the problem exists without that line. It's 
suspicious. In any event it seems worth doing away with it while fixing this 
up, which may still entail reverting the main change for safety but also trying 
to prove there's no similar problem still lurking in the previous Logging 
approach that needs any working-around anywhere.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363302#comment-15363302
 ] 

Stavros Kontopoulos commented on SPARK-16379:
-

There was no bug previously in the scheduler. It was working or not?
The project was not broken and the best practice is to keep it that way all the 
time. 
Otherwise if we dont follow best practices we can do anything we want. I dont 
understand
why we cannot even agree on that?

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289
 ] 

Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:35 PM:
-

I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html

{quote}
One other thing i hope it holds is no new commit should break the project even 
if it fixes something or reveals another issue etc.
{quote}

Well I do agree with Sean that it's on us to fix bugs revealed by external 
changes.


was (Author: mgummelt):
I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html

| One other thing i hope it holds is no new commit should break the project 
even if it fixes something or reveals another issue etc.

Well I do agree with Sean that it's on us to fix bugs revealed by external 
changes.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289
 ] 

Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:34 PM:
-

I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html

| One other thing i hope it holds is no new commit should break the project 
even if it fixes something or reveals another issue etc.

Well I do agree with Sean that it's on us to fix bugs revealed by external 
changes.


was (Author: mgummelt):
I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html

> One other thing i hope it holds is no new commit should break the project 
> even if it fixes something or reveals another issue etc.

Well I do agree with Sean that it's on us to fix bugs revealed by external 
changes.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363296#comment-15363296
 ] 

Sean Owen commented on SPARK-16379:
---

I don't think it's even a bad practice, any more than using {{synchronized}}. 
Ideally, if change A uncovers bug B then it needs to be expanded to address the 
bug and committed all at once. Nobody is suggesting knowingly making a change 
that triggers a bug, so I am not sure what this is arguing against in this 
context. We have a bug and need to address it in the best way we can see.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289
 ] 

Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:33 PM:
-

I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html

> One other thing i hope it holds is no new commit should break the project 
> even if it fixes something or reveals another issue etc.

Well I do agree with Sean that it's on us to fix bugs revealed by external 
changes.


was (Author: mgummelt):
I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289
 ] 

Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:31 PM:
-

I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html


was (Author: mgummelt):
I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363290#comment-15363290
 ] 

Michael Gummelt commented on SPARK-16379:
-

Hmmm, since that's a different lock, I don't see the possibility for deadlock 
in the previous code, but I'm content to relinquish the point.  Concurrency is 
hard :)

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289
 ] 

Michael Gummelt commented on SPARK-16379:
-

I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363278#comment-15363278
 ] 

Stavros Kontopoulos edited comment on SPARK-16379 at 7/5/16 9:27 PM:
-

Ok we can be flexible thats not the issue. A warning at least. Given it has 
caused an issue twice. Sometimes i prefer to be more strict, but its just a 
suggestion. Could be added as a warning in the project code guidelines for 
example. 

One other thing i hope it holds is no new commit should break the project even 
if it fixes something or reveals another issue etc.



was (Author: skonto):
Ok we can be flexible thats not the issue. A warning at least. Given it has 
caused an issue twice. Sometimes i prefer to be more strict, but its just a 
suggestion. Could be added as a warning in the project code guidelines for 
example. 

One other thing i hope it holds is no new commit should break the project even 
if ti fixes something or reveals another issue.


> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363278#comment-15363278
 ] 

Stavros Kontopoulos edited comment on SPARK-16379 at 7/5/16 9:26 PM:
-

Ok we can be flexible thats not the issue. A warning at least. Given it has 
caused an issue twice. Sometimes i prefer to be more strict, but its just a 
suggestion. Could be added as a warning in the project code guidelines for 
example. 

One other thing i hope it holds is no new commit should break the project even 
if ti fixes something or reveals another issue.



was (Author: skonto):
Ok we can be flexible thats not the issue. A warning at least. Given it has 
caused an issue twice. Sometimes i prefer to be more strict, but its just a 
suggestion. Could be added as a warning in the project code guidelines for 
example. 

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363278#comment-15363278
 ] 

Stavros Kontopoulos commented on SPARK-16379:
-

Ok we can be flexible thats not the issue. A warning at least. Given it has 
caused an issue twice. Sometimes i prefer to be more strict, but its just a 
suggestion. Could be added as a warning in the project code guidelines for 
example. 

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363272#comment-15363272
 ] 

Sean Owen commented on SPARK-16379:
---

Yes, but that is what I am arguing. Above you said it should be prohibited in 
all cases. I don't think it should be prohibited.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363270#comment-15363270
 ] 

Stavros Kontopoulos edited comment on SPARK-16379 at 7/5/16 9:18 PM:
-

> Arguably it's "synchronized" that is the issue here, really.
Is it forbidden to use a synchronized block if i know what i am doing? The same 
applies to the log lazy val.
If you know what you are doing i am sure its fine.
The problem here is that we have two incompatible code parts and we have to 
merge them somehow in order to proceed.


was (Author: skonto):
> Arguably it's "synchronized" that is the issue here, really.
Is it forbidden to use a synchronized block if i know what i am doing? The same 
applies to the log lazy val.
If you know what you are doing i am sure its fine.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363269#comment-15363269
 ] 

Sean Owen commented on SPARK-16379:
---

Oh I also meant this as the existing workaround: 
https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-7d99a7c7a051e5e851aaaefb275a44a1L103

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363270#comment-15363270
 ] 

Stavros Kontopoulos commented on SPARK-16379:
-

> Arguably it's "synchronized" that is the issue here, really.
Is it forbidden to use a synchronized block if i know what i am doing? The same 
applies to the log lazy val.
If you know what you are doing i am sure its fine.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363266#comment-15363266
 ] 

Sean Owen commented on SPARK-16379:
---

I mean this: 
https://github.com/apache/spark/blob/044971eca0ff3c2ce62afa665dbd3072d52cbbec/core/src/main/scala/org/apache/spark/internal/Logging.scala#L94

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363258#comment-15363258
 ] 

Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:15 PM:
-

> it's entirely possible that code has a bug that's only revealed when some 
> other legitimate change happens

Of course, but I still don't see the bug that existed previously.  Perhaps 
`synchronized` was unnecessary, but I still see no race condition nor deadlock 
in the previous code.  Maybe following up on this will help:

> The previous code also involved acquiring a lock

Link? I don't see this. Or do you just mean the null check? 
https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45



was (Author: mgummelt):
> The previous code also involved acquiring a lock

Link? I don't see this. Or do you just mean the null check? 
https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45


> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363256#comment-15363256
 ] 

Michael Gummelt commented on SPARK-16379:
-

> The previous code also involved acquiring a lock

Link?  I don't see this.  Or do you just mean the null check? 
https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45



> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363258#comment-15363258
 ] 

Michael Gummelt commented on SPARK-16379:
-

> The previous code also involved acquiring a lock

Link? I don't see this. Or do you just mean the null check? 
https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45


> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-16379:

Comment: was deleted

(was: > The previous code also involved acquiring a lock

Link?  I don't see this.  Or do you just mean the null check? 
https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45

)

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16363) Spark-submit doesn't work with IAM Roles

2016-07-05 Thread Ewan Leith (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363249#comment-15363249
 ] 

Ewan Leith commented on SPARK-16363:


I'm not sure this is a major issue, but try running with the filesystem path 
s3a:// as it looks like you're using the legacy Jets3t system which I'm sure 
doesnt support IAM roles

> Spark-submit doesn't work with IAM Roles
> 
>
> Key: SPARK-16363
> URL: https://issues.apache.org/jira/browse/SPARK-16363
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.6.2
> Environment: Spark Stand-Alone with EC2 instances configured with IAM 
> Roles. 
>Reporter: Ashic Mahtab
>
> When running Spark Stand-alone in EC2 boxes, 
> spark-submit --master spark://master-ip:7077 --class Foo 
> --deploy-mode cluster --verbose s3://bucket/dir/foo/jar
> fails to find the jar even if AWS IAM roles are configured to allow the EC2 
> boxes (that are running Spark master, and workers) access to the file in S3. 
> The exception is provided below. It's asking us to set keys, etc. when the 
> boxes are configured via IAM roles. 
> 16/07/04 11:44:09 ERROR ClientEndpoint: Exception from cluster was: 
> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key 
> must be specified as the username or password (respectively) of a s3 URL, or 
> by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties 
> (respectively).
> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key 
> must be specified as the username or password (respectively) of a s3 URL, or 
> by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties 
> (respectively).
> at 
> org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66)
> at 
> org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
> at com.sun.proxy.$Proxy5.initialize(Unknown Source)
> at 
> org.apache.hadoop.fs.s3.S3FileSystem.initialize(S3FileSystem.java:77)
> at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
> at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1686)
> at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:598)
> at org.apache.spark.util.Utils$.fetchFile(Utils.scala:395)
> at 
> org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150)
> at 
> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:79)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15425) Disallow cartesian joins by default

2016-07-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363218#comment-15363218
 ] 

Apache Spark commented on SPARK-15425:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/14057

> Disallow cartesian joins by default
> ---
>
> Key: SPARK-15425
> URL: https://issues.apache.org/jira/browse/SPARK-15425
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Sameer Agarwal
> Fix For: 2.0.0
>
>
> It is fairly easy for users to shoot themselves in the foot if they run 
> cartesian joins. Often they might not even be aware of the join methods 
> chosen. This happened to me a few times in the last few weeks.
> It would be a good idea to disable cartesian joins by default, and require 
> explicit enabling of it via "crossJoin" method or in SQL "cross join". This 
> however might be too large of a scope for 2.0 given the timing. As a small 
> and quick fix, we can just have a single config option 
> (spark.sql.join.enableCartesian) that controls this behavior. In the future 
> we can implement the fine-grained control.
> Note that the error message should be friendly and say "Set 
> spark.sql.join.enableCartesian to true to turn on cartesian joins."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363199#comment-15363199
 ] 

Sean Owen commented on SPARK-16379:
---

I don't agree with that logic; it's entirely possible that code has a bug 
that's only revealed when some other legitimate change happens, and the right 
subsequent change is to fix the bug. I don't think we'd ban lazy vals either. 
Arguably it's "synchronized" that is the issue here, really.

Indeed, reverting the last patch only 'fixes' it because the code contained a 
hack to avoid this condition. The previous code also involved acquiring a lock, 
and I'm guessing it _could_ still be a problem, though less likely to come up 
given that the locking only happens during the first call (well hopefully). 
Removing the logInfo actually removes the issue more directly than 
reintroducing the hack. Changing the startScheduler method is _probably_ the 
right-er fix, though that's less conservative.

I'm not against reverting the change just on the grounds that Logging is 
inherited lots of places and so there's a risk of a repeat of this problem 
elsewhere, even if it may ultimately also be due to some other coding problem. 
I'd just rather not also reintroduce the hack.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16367) Wheelhouse Support for PySpark

2016-07-05 Thread Semet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363189#comment-15363189
 ] 

Semet commented on SPARK-16367:
---

This is where the magic of wheels lies:
- look at https://pypi.python.org/pypi/numpy , there are all wheels for various 
Python version, 32/64b, Linux/Mac/Windows. Simply copy from pypi.python.org + 
some drops, and that's all, no compilation needed
- no compilation is needed upon installation, and of all wheels are put in the 
wheelhouse archive the installation will only consist of package unzipping 
(automatically handled by pip install)
- the creation of wheelhouse is really simple: pip install wheels, and then pip 
wheels. I'll write a tutorial in the documentation.

I have rebased your patch actually, so the cache thing will be kept:)

> Wheelhouse Support for PySpark
> --
>
> Key: SPARK-16367
> URL: https://issues.apache.org/jira/browse/SPARK-16367
> Project: Spark
>  Issue Type: New Feature
>  Components: Deploy, PySpark
>Affects Versions: 1.6.1, 1.6.2, 2.0.0
>Reporter: Semet
>  Labels: newbie, python, python-wheel, wheelhouse
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> *Rational*
> Is it recommended, in order to deploying Scala packages written in Scala, to 
> build big fat jar files. This allows to have all dependencies on one package 
> so the only "cost" is copy time to deploy this file on every Spark Node.
> On the other hand, Python deployment is more difficult once you want to use 
> external packages, and you don't really want to mess with the IT to deploy 
> the packages on the virtualenv of each nodes.
> *Previous approaches*
> I based the current proposal over the two following bugs related to this 
> point:
> - SPARK-6764 ("Wheel support for PySpark")
> - SPARK-13587("Support virtualenv in PySpark")
> First part of my proposal was to merge, in order to support wheels install 
> and virtualenv creation
> *Uber Fat Wheelhouse for Python Deployment*
> In Python, the packaging standard is now "wheels", which goes further that 
> old good ".egg" files. With a wheel file (".whl"), the package is already 
> prepared for a given architecture. You can have several wheel, each specific 
> to an architecture, or environment. 
> The {{pip}} tools now how to select the package matching the current system, 
> how to install this package in a light speed. Said otherwise, package that 
> requires compilation of a C module, for instance, does *not* compile anything 
> when installing from wheel file.
> {{pip}} also provides the ability to generate easily all wheel of all 
> packages used for a given module (inside a "virtualenv"). This is called 
> "wheelhouse". You can even don't mess with this compilation and retrieve it 
> directly from pypi.python.org.
> *Developer workflow*
> Here is, in a more concrete way, my proposal for on Pyspark developers point 
> of view:
> - you are writing a PySpark script that increase in term of size and 
> dependencies. Deploying on Spark for example requires to build numpy or 
> Theano and other dependencies
> - to use "Big Fat Wheelhouse" support of Pyspark, you need to turn his script 
> into a standard Python package:
> -- write a {{requirements.txt}}. I recommend to specify all package version. 
> You can use [pip-tools|https://github.com/nvie/pip-tools] to maintain the 
> requirements.txt
> {code}
> astroid==1.4.6# via pylint
> autopep8==1.2.4
> click==6.6# via pip-tools
> colorama==0.3.7   # via pylint
> enum34==1.1.6 # via hypothesis
> findspark==1.0.0  # via spark-testing-base
> first==2.0.1  # via pip-tools
> hypothesis==3.4.0 # via spark-testing-base
> lazy-object-proxy==1.2.2  # via astroid
> linecache2==1.0.0 # via traceback2
> pbr==1.10.0
> pep8==1.7.0   # via autopep8
> pip-tools==1.6.5
> py==1.4.31# via pytest
> pyflakes==1.2.3
> pylint==1.5.6
> pytest==2.9.2 # via spark-testing-base
> six==1.10.0   # via astroid, pip-tools, pylint, unittest2
> spark-testing-base==0.0.7.post2
> traceback2==1.4.0 # via unittest2
> unittest2==1.1.0  # via spark-testing-base
> wheel==0.29.0
> wrapt==1.10.8 # via astroid
> {code}
> -- write a setup.py with some entry points or package. Use 
> [PBR|http://docs.openstack.org/developer/pbr/] it makes the jobs of 
> maitaining a setup.py files really easy
> -- create a virtualenv if not already in one:
> {code}
> virtualenv env
> {code}
> -- Work on your environment, define the requirement you need in 
> {{requirements.txt}}, do all the {{pip install}} you need.
> - create the wheelhouse for your current project
> {code}
> pip install wheelhouse
> pip wheel . --wheel-dir wheelhouse
> {code}
> This can take some 

[jira] [Assigned] (SPARK-16385) NoSuchMethodException thrown by Utils.waitForProcess

2016-07-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16385:


Assignee: (was: Apache Spark)

> NoSuchMethodException thrown by Utils.waitForProcess
> 
>
> Key: SPARK-16385
> URL: https://issues.apache.org/jira/browse/SPARK-16385
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
>
> The code in Utils.waitForProcess catches the wrong exception: when using 
> reflection, {{NoSuchMethodException}} is thrown, but the code catches 
> {{NoSuchMethodError}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16385) NoSuchMethodException thrown by Utils.waitForProcess

2016-07-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363154#comment-15363154
 ] 

Apache Spark commented on SPARK-16385:
--

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/14056

> NoSuchMethodException thrown by Utils.waitForProcess
> 
>
> Key: SPARK-16385
> URL: https://issues.apache.org/jira/browse/SPARK-16385
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
>
> The code in Utils.waitForProcess catches the wrong exception: when using 
> reflection, {{NoSuchMethodException}} is thrown, but the code catches 
> {{NoSuchMethodError}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16385) NoSuchMethodException thrown by Utils.waitForProcess

2016-07-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16385:


Assignee: Apache Spark

> NoSuchMethodException thrown by Utils.waitForProcess
> 
>
> Key: SPARK-16385
> URL: https://issues.apache.org/jira/browse/SPARK-16385
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
>
> The code in Utils.waitForProcess catches the wrong exception: when using 
> reflection, {{NoSuchMethodException}} is thrown, but the code catches 
> {{NoSuchMethodError}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16385) NoSuchMethodException thrown by Utils.waitForProcess

2016-07-05 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-16385:
--

 Summary: NoSuchMethodException thrown by Utils.waitForProcess
 Key: SPARK-16385
 URL: https://issues.apache.org/jira/browse/SPARK-16385
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.0
Reporter: Marcelo Vanzin


The code in Utils.waitForProcess catches the wrong exception: when using 
reflection, {{NoSuchMethodException}} is thrown, but the code catches 
{{NoSuchMethodError}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16385) NoSuchMethodException thrown by Utils.waitForProcess

2016-07-05 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363153#comment-15363153
 ] 

Marcelo Vanzin commented on SPARK-16385:


Here's what I see when running unit tests on java 7:

{noformat}
Exception in thread "ExecutorRunner for app-20160705131725-/1" 
java.lang.NoSuchMethodException: java.lang.Process.waitFor(long, 
java.util.concurrent.TimeUnit)
at java.lang.Class.getMethod(Class.java:1678)
at org.apache.spark.util.Utils$.waitForProcess(Utils.scala:1812)
at org.apache.spark.util.Utils$.terminateProcess(Utils.scala:1783)
at 
org.apache.spark.deploy.worker.ExecutorRunner.org$apache$spark$deploy$worker$ExecutorRunner$$killProcess(ExecutorRunner.scala:101)
at 
org.apache.spark.deploy.worker.ExecutorRunner.org$apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:185)
at 
org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
{noformat}


> NoSuchMethodException thrown by Utils.waitForProcess
> 
>
> Key: SPARK-16385
> URL: https://issues.apache.org/jira/browse/SPARK-16385
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
>
> The code in Utils.waitForProcess catches the wrong exception: when using 
> reflection, {{NoSuchMethodException}} is thrown, but the code catches 
> {{NoSuchMethodError}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >