[jira] [Updated] (SPARK-18361) Expose RDD localCheckpoint in PySpark

2016-11-21 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-18361:
--
Assignee: Gabriel Huang

> Expose RDD localCheckpoint in PySpark
> -
>
> Key: SPARK-18361
> URL: https://issues.apache.org/jira/browse/SPARK-18361
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Reporter: Gabriel Huang
>Assignee: Gabriel Huang
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> As of today, I could not access rdd.localCheckpoint() in pyspark.
> This is an important issue for machine learning people, as we often have to 
> iterate algorithms and perform operations like joins in each iteration. 
> If the lineage is not truncated, the memory usage, the lineage, and 
> computation time explode. rdd.localCheckpoint()  seems like the most 
> straightforward way of truncating the lineage, but the python API does not 
> expose it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18361) Expose RDD localCheckpoint in PySpark

2016-11-21 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-18361.
---
  Resolution: Fixed
   Fix Version/s: 2.1.0
Target Version/s: 2.1.0

> Expose RDD localCheckpoint in PySpark
> -
>
> Key: SPARK-18361
> URL: https://issues.apache.org/jira/browse/SPARK-18361
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Reporter: Gabriel Huang
>Assignee: Gabriel Huang
> Fix For: 2.1.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> As of today, I could not access rdd.localCheckpoint() in pyspark.
> This is an important issue for machine learning people, as we often have to 
> iterate algorithms and perform operations like joins in each iteration. 
> If the lineage is not truncated, the memory usage, the lineage, and 
> computation time explode. rdd.localCheckpoint()  seems like the most 
> straightforward way of truncating the lineage, but the python API does not 
> expose it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18517) DROP TABLE IF EXISTS should not warn for non-existing tables

2016-11-21 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-18517.
---
  Resolution: Fixed
Assignee: Dongjoon Hyun
   Fix Version/s: 2.1.0
Target Version/s: 2.1.0

> DROP TABLE IF EXISTS should not warn for non-existing tables
> 
>
> Key: SPARK-18517
> URL: https://issues.apache.org/jira/browse/SPARK-18517
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.1.0
>
>
> Currently, `DROP TABLE IF EXISTS` shows warning for non-existing tables. 
> However, it had better be quiet for this case by definition of the command.
> {code}
> scala> sql("DROP TABLE IF EXISTS nonexist")
> 16/11/20 20:48:26 WARN DropTableCommand: 
> org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 
> 'nonexist' not found in database 'default';
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18507) Major performance regression in SHOW PARTITIONS on partitioned Hive tables

2016-11-22 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-18507.
---
  Resolution: Fixed
Assignee: Wenchen Fan
   Fix Version/s: 2.1.0
Target Version/s: 2.1.0

> Major performance regression in SHOW PARTITIONS on partitioned Hive tables
> --
>
> Key: SPARK-18507
> URL: https://issues.apache.org/jira/browse/SPARK-18507
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Michael Allman
>Assignee: Wenchen Fan
>Priority: Critical
> Fix For: 2.1.0
>
>
> Commit {{ccb11543048dccd4cc590a8db1df1d9d5847d112}} 
> (https://github.com/apache/spark/commit/ccb11543048dccd4cc590a8db1df1d9d5847d112)
>  appears to have introduced a major regression in the performance of the Hive 
> {{SHOW PARTITIONS}} command. Running that command on a Hive table with 17,337 
> partitions in the {{spark-sql}} shell with the parent commit of {{ccb1154}} 
> takes approximately 7.3 seconds. Running the same command with commit 
> {{ccb1154}} takes approximately 250 seconds.
> I have not had the opportunity to complete a thorough investigation, but I 
> suspect the problem lies in the diff hunk beginning at 
> https://github.com/apache/spark/commit/ccb11543048dccd4cc590a8db1df1d9d5847d112#diff-159191585e10542f013cb3a714f26075L675.
>  If that's the case, this performance issue should manifest itself in other 
> areas as this programming pattern was used elsewhere in this commit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-18050) spark 2.0.1 enable hive throw AlreadyExistsException(message:Database default already exists)

2016-11-23 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reopened SPARK-18050:
---
  Assignee: Wenchen Fan

> spark 2.0.1 enable hive throw AlreadyExistsException(message:Database default 
> already exists)
> -
>
> Key: SPARK-18050
> URL: https://issues.apache.org/jira/browse/SPARK-18050
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
> Environment: jdk1.8, macOs,spark 2.0.1
>Reporter: todd.chen
>Assignee: Wenchen Fan
>
> in spark 2.0.1 ,I enable hive support and when init the sqlContext ,throw a 
> AlreadyExistsException(message:Database default already exists),same as 
> https://www.mail-archive.com/dev@spark.apache.org/msg15306.html ,my code is 
> {code}
>   private val master = "local[*]"
>   private val appName = "xqlServerSpark"
>   val fileSystem = FileSystem.get()
>   val sparkConf = new SparkConf().setMaster(master).
> setAppName(appName).set("spark.sql.warehouse.dir", 
> s"${fileSystem.getUri.toASCIIString}/user/hive/warehouse")
>   val   hiveContext = 
> SparkSession.builder().config(sparkConf).enableHiveSupport().getOrCreate().sqlContext
> print(sparkConf.get("spark.sql.warehouse.dir"))
> hiveContext.sql("show tables").show()
> {code}
> the result is correct,but a exception also throwBy the code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18050) spark 2.0.1 enable hive throw AlreadyExistsException(message:Database default already exists)

2016-11-23 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-18050.
---
  Resolution: Fixed
   Fix Version/s: 2.1.0
Target Version/s: 2.1.0

> spark 2.0.1 enable hive throw AlreadyExistsException(message:Database default 
> already exists)
> -
>
> Key: SPARK-18050
> URL: https://issues.apache.org/jira/browse/SPARK-18050
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
> Environment: jdk1.8, macOs,spark 2.0.1
>Reporter: todd.chen
>Assignee: Wenchen Fan
> Fix For: 2.1.0
>
>
> in spark 2.0.1 ,I enable hive support and when init the sqlContext ,throw a 
> AlreadyExistsException(message:Database default already exists),same as 
> https://www.mail-archive.com/dev@spark.apache.org/msg15306.html ,my code is 
> {code}
>   private val master = "local[*]"
>   private val appName = "xqlServerSpark"
>   val fileSystem = FileSystem.get()
>   val sparkConf = new SparkConf().setMaster(master).
> setAppName(appName).set("spark.sql.warehouse.dir", 
> s"${fileSystem.getUri.toASCIIString}/user/hive/warehouse")
>   val   hiveContext = 
> SparkSession.builder().config(sparkConf).enableHiveSupport().getOrCreate().sqlContext
> print(sparkConf.get("spark.sql.warehouse.dir"))
> hiveContext.sql("show tables").show()
> {code}
> the result is correct,but a exception also throwBy the code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14893) Re-enable HiveSparkSubmitSuite SPARK-8489 test after HiveContext is removed

2016-04-25 Thread Andrew Or (JIRA)
Andrew Or created SPARK-14893:
-

 Summary: Re-enable HiveSparkSubmitSuite SPARK-8489 test after 
HiveContext is removed
 Key: SPARK-14893
 URL: https://issues.apache.org/jira/browse/SPARK-14893
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 2.0.0
Reporter: Andrew Or


The test was disabled in https://github.com/apache/spark/pull/12585. To 
re-enable it we need to rebuild the jar using the updated source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14895) SparkSession Python API

2016-04-25 Thread Andrew Or (JIRA)
Andrew Or created SPARK-14895:
-

 Summary: SparkSession Python API
 Key: SPARK-14895
 URL: https://issues.apache.org/jira/browse/SPARK-14895
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14896) Remove HiveContext in Python

2016-04-25 Thread Andrew Or (JIRA)
Andrew Or created SPARK-14896:
-

 Summary: Remove HiveContext in Python
 Key: SPARK-14896
 URL: https://issues.apache.org/jira/browse/SPARK-14896
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14721) Remove the HiveContext class

2016-04-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14721.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Remove the HiveContext class
> 
>
> Key: SPARK-14721
> URL: https://issues.apache.org/jira/browse/SPARK-14721
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14902) Expose user-facing RuntimeConfig in SparkSession

2016-04-25 Thread Andrew Or (JIRA)
Andrew Or created SPARK-14902:
-

 Summary: Expose user-facing RuntimeConfig in SparkSession
 Key: SPARK-14902
 URL: https://issues.apache.org/jira/browse/SPARK-14902
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14904) Add back HiveContext in compatibility package

2016-04-25 Thread Andrew Or (JIRA)
Andrew Or created SPARK-14904:
-

 Summary: Add back HiveContext in compatibility package
 Key: SPARK-14904
 URL: https://issues.apache.org/jira/browse/SPARK-14904
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14940) Move ExternalCatalog to own file

2016-04-26 Thread Andrew Or (JIRA)
Andrew Or created SPARK-14940:
-

 Summary: Move ExternalCatalog to own file
 Key: SPARK-14940
 URL: https://issues.apache.org/jira/browse/SPARK-14940
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14945) Python SparkSession API

2016-04-26 Thread Andrew Or (JIRA)
Andrew Or created SPARK-14945:
-

 Summary: Python SparkSession API
 Key: SPARK-14945
 URL: https://issues.apache.org/jira/browse/SPARK-14945
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 2.0.0
Reporter: Andrew Or






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14945) Python SparkSession API

2016-04-26 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-14945:
-

Assignee: Andrew Or

> Python SparkSession API
> ---
>
> Key: SPARK-14945
> URL: https://issues.apache.org/jira/browse/SPARK-14945
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14915) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job to never complete

2016-04-27 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260623#comment-15260623
 ] 

Andrew Or commented on SPARK-14915:
---

I haven't looked into the scheduler code in detail yet, but it seems to me the 
bug is not caused by your fix to use the `CausedBy`. Rather, the bug has always 
existed, but your fix just uncovered it. It does seem like a problem in the 
scheduler; under no circumstances should we retry a stage without limits.

> Tasks that fail due to CommitDeniedException (a side-effect of speculation) 
> can cause job to never complete
> ---
>
> Key: SPARK-14915
> URL: https://issues.apache.org/jira/browse/SPARK-14915
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.2
>Reporter: Jason Moore
>Priority: Critical
>
> In SPARK-14357, code was corrected towards the originally intended behavior 
> that a CommitDeniedException should not count towards the failure count for a 
> job.  After having run with this fix for a few weeks, it's become apparent 
> that this behavior has some unintended consequences - that a speculative task 
> will continuously receive a CDE from the driver, now causing it to fail and 
> retry over and over without limit.
> I'm thinking we could put a task that receives a CDE from the driver, into a 
> TaskState.FINISHED or some other state to indicated that the task shouldn't 
> be resubmitted by the TaskScheduler. I'd probably need some opinions on 
> whether there are other consequences for doing something like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14014) Replace existing analysis.Catalog with SessionCatalog

2016-04-27 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14014.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Replace existing analysis.Catalog with SessionCatalog
> -
>
> Key: SPARK-14014
> URL: https://issues.apache.org/jira/browse/SPARK-14014
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> As of this moment, there exist many catalogs in Spark. For Spark 2.0, we will 
> have two high level catalogs only: SessionCatalog and ExternalCatalog. 
> SessionCatalog (implemented in SPARK-13923) keeps track of temporary 
> functions and tables and delegates other operations to ExternalCatalog.
> At the same time, there's this legacy catalog called `analysis.Catalog` that 
> also tracks temporary functions and tables. The goal is to get rid of this 
> legacy catalog and replace it with SessionCatalog, which is the new thing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14014) Replace existing analysis.Catalog with SessionCatalog

2016-04-27 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261177#comment-15261177
 ] 

Andrew Or commented on SPARK-14014:
---

Pretty sure this was fixed. :)

> Replace existing analysis.Catalog with SessionCatalog
> -
>
> Key: SPARK-14014
> URL: https://issues.apache.org/jira/browse/SPARK-14014
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> As of this moment, there exist many catalogs in Spark. For Spark 2.0, we will 
> have two high level catalogs only: SessionCatalog and ExternalCatalog. 
> SessionCatalog (implemented in SPARK-13923) keeps track of temporary 
> functions and tables and delegates other operations to ExternalCatalog.
> At the same time, there's this legacy catalog called `analysis.Catalog` that 
> also tracks temporary functions and tables. The goal is to get rid of this 
> legacy catalog and replace it with SessionCatalog, which is the new thing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14988) Implement catalog and conf API in Python SparkSession

2016-04-28 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14988:
--
Description: This is like implementing SPARK-13477 and SPARK-13487 in the 
Python SparkSession.

> Implement catalog and conf API in Python SparkSession
> -
>
> Key: SPARK-14988
> URL: https://issues.apache.org/jira/browse/SPARK-14988
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> This is like implementing SPARK-13477 and SPARK-13487 in the Python 
> SparkSession.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14988) Implement catalog and conf API in Python SparkSession

2016-04-28 Thread Andrew Or (JIRA)
Andrew Or created SPARK-14988:
-

 Summary: Implement catalog and conf API in Python SparkSession
 Key: SPARK-14988
 URL: https://issues.apache.org/jira/browse/SPARK-14988
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14988) Implement catalog and conf API in Python SparkSession

2016-04-29 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14988.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Implement catalog and conf API in Python SparkSession
> -
>
> Key: SPARK-14988
> URL: https://issues.apache.org/jira/browse/SPARK-14988
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> This is like implementing SPARK-13477 and SPARK-13487 in the Python 
> SparkSession.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15012) Simplify configuration API further

2016-04-29 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15012:
-

 Summary: Simplify configuration API further
 Key: SPARK-15012
 URL: https://issues.apache.org/jira/browse/SPARK-15012
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


Proposals:

(1) Remove all the setConf, getConf etc. Just expose `spark.conf`
(2) Make `spark.conf` take in things set in the core `SparkConf` as well, 
otherwise users may get confused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-14896) Remove HiveContext in Python

2016-04-29 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reopened SPARK-14896:
---

> Remove HiveContext in Python
> 
>
> Key: SPARK-14896
> URL: https://issues.apache.org/jira/browse/SPARK-14896
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-14896) Remove HiveContext in Python

2016-04-29 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-14896.
-
Resolution: Won't Fix

> Remove HiveContext in Python
> 
>
> Key: SPARK-14896
> URL: https://issues.apache.org/jira/browse/SPARK-14896
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14673) Remove HiveContext

2016-04-29 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14673.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Remove HiveContext
> --
>
> Key: SPARK-14673
> URL: https://issues.apache.org/jira/browse/SPARK-14673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 2.0.0
>
>
> In Spark 2.0, we will have a new SparkSession that can run commands against 
> the Hive metastore. The metastore will be initialized lazily so as not to 
> slow down the initialization of spark-shell. This is the first step towards 
> that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14896) Deprecate HiveContext in Python

2016-04-29 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14896:
--
Summary: Deprecate HiveContext in Python  (was: Remove HiveContext in 
Python)

> Deprecate HiveContext in Python
> ---
>
> Key: SPARK-14896
> URL: https://issues.apache.org/jira/browse/SPARK-14896
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15019) Propagate all Spark Confs to HiveConf created in HiveClientImpl

2016-04-29 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15019.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Propagate all Spark Confs to HiveConf created in HiveClientImpl
> ---
>
> Key: SPARK-15019
> URL: https://issues.apache.org/jira/browse/SPARK-15019
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 2.0.0
>
>
> Right now, the HiveConf created in HiveClientImpl only takes conf set at 
> runtime or set in hive-site.xml. We should also propagate Spark confs to it. 
> So, users do not have to use hive-site.xml to set warehouse location and 
> metastore url.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-14895) SparkSession Python API

2016-04-29 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-14895.
-
Resolution: Duplicate

> SparkSession Python API
> ---
>
> Key: SPARK-14895
> URL: https://issues.apache.org/jira/browse/SPARK-14895
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15068) Use proper metastore warehouse path

2016-05-02 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15068:
-

 Summary: Use proper metastore warehouse path
 Key: SPARK-15068
 URL: https://issues.apache.org/jira/browse/SPARK-15068
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


Today, we use "" empty string without Hive, and "/user/hive/warehouse", which 
is not a great default since it probably doesn't exist on the box. Instead, it 
would be better to use a subdir inside `user.dir` or something.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15068) Use proper metastore warehouse path

2016-05-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15068:
--
Description: Today, we use "" empty string without Hive, and 
"/user/hive/warehouse" with Hive, which is not a great default since it 
probably doesn't exist on the box. Instead, it would be better to use a subdir 
inside `user.dir` or something.  (was: Today, we use "" empty string without 
Hive, and "/user/hive/warehouse", which is not a great default since it 
probably doesn't exist on the box. Instead, it would be better to use a subdir 
inside `user.dir` or something.)

> Use proper metastore warehouse path
> ---
>
> Key: SPARK-15068
> URL: https://issues.apache.org/jira/browse/SPARK-15068
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> Today, we use "" empty string without Hive, and "/user/hive/warehouse" with 
> Hive, which is not a great default since it probably doesn't exist on the 
> box. Instead, it would be better to use a subdir inside `user.dir` or 
> something.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15068) Use proper metastore warehouse path

2016-05-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15068.
---
Resolution: Duplicate

> Use proper metastore warehouse path
> ---
>
> Key: SPARK-15068
> URL: https://issues.apache.org/jira/browse/SPARK-15068
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> Today, we use "" empty string without Hive, and "/user/hive/warehouse" with 
> Hive, which is not a great default since it probably doesn't exist on the 
> box. Instead, it would be better to use a subdir inside `user.dir` or 
> something.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites

2016-05-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15037:
--
Summary: Use SparkSession instead of SQLContext in testsuites  (was: Use 
SparkSession instread of SQLContext in testsuites)

> Use SparkSession instead of SQLContext in testsuites
> 
>
> Key: SPARK-15037
> URL: https://issues.apache.org/jira/browse/SPARK-15037
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Dongjoon Hyun
>
> This issue aims to update the existing testsuites to use `SparkSession` 
> instread of `SQLContext` since `SQLContext` exists just for backward 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15073) Make SparkSession constructors private

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-15073:
-

Assignee: Andrew Or

> Make SparkSession constructors private
> --
>
> Key: SPARK-15073
> URL: https://issues.apache.org/jira/browse/SPARK-15073
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> So users have to use the Builder pattern.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15084) Use builder pattern to create SparkSession in PySpark

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15084:
--
Assignee: Dongjoon Hyun

> Use builder pattern to create SparkSession in PySpark
> -
>
> Key: SPARK-15084
> URL: https://issues.apache.org/jira/browse/SPARK-15084
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>
> This is a Python port of SPARK-15052.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15072) Remove SparkSession.withHiveSupport

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15072:
--
Assignee: Sandeep Singh

> Remove SparkSession.withHiveSupport
> ---
>
> Key: SPARK-15072
> URL: https://issues.apache.org/jira/browse/SPARK-15072
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Sandeep Singh
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14422) Improve handling of optional configs in SQLConf

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14422.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Improve handling of optional configs in SQLConf
> ---
>
> Key: SPARK-14422
> URL: https://issues.apache.org/jira/browse/SPARK-14422
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
>Priority: Minor
> Fix For: 2.0.0
>
>
> As Michael showed here: 
> https://github.com/apache/spark/pull/12119/files/69aa1a005cc7003ab62d6dfcdef42181b053eaed#r58634150
> Handling of optional configs in SQLConf is a little sub-optimal right now. We 
> should clean that up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14422) Improve handling of optional configs in SQLConf

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14422:
--
Assignee: Sandeep Singh

> Improve handling of optional configs in SQLConf
> ---
>
> Key: SPARK-14422
> URL: https://issues.apache.org/jira/browse/SPARK-14422
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
>Assignee: Sandeep Singh
>Priority: Minor
> Fix For: 2.0.0
>
>
> As Michael showed here: 
> https://github.com/apache/spark/pull/12119/files/69aa1a005cc7003ab62d6dfcdef42181b053eaed#r58634150
> Handling of optional configs in SQLConf is a little sub-optimal right now. We 
> should clean that up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14645) non local Python resource doesn't work with Mesos cluster mode

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14645.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> non local Python resource doesn't work with Mesos cluster mode
> --
>
> Key: SPARK-14645
> URL: https://issues.apache.org/jira/browse/SPARK-14645
> Project: Spark
>  Issue Type: Bug
>Reporter: Timothy Chen
> Fix For: 2.0.0
>
>
> Currently SparkSubmit explicitly allows non-local python resources for 
> cluster mode with Mesos, which it's actually supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15084) Use builder pattern to create SparkSession in PySpark

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15084.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Use builder pattern to create SparkSession in PySpark
> -
>
> Key: SPARK-15084
> URL: https://issues.apache.org/jira/browse/SPARK-15084
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
> Fix For: 2.0.0
>
>
> This is a Python port of SPARK-15052.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15097) Import fails for someDataset.sqlContext.implicits._

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15097.
---
  Resolution: Fixed
Assignee: Koert Kuipers
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Import fails for someDataset.sqlContext.implicits._
> ---
>
> Key: SPARK-15097
> URL: https://issues.apache.org/jira/browse/SPARK-15097
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
> Environment: spark-2.0.0-SNAPSHOT
>Reporter: koert kuipers
>Assignee: Koert Kuipers
> Fix For: 2.0.0
>
>
> with the introduction of SparkSession SQLContext changed from being a lazy 
> val to a def inside Dataset. however this is troublesome if you want to do:
> import someDataset.sqlContext.implicits._
> you get this error:
> stable identifier required, but someDataset.sqlContext.implicits found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14414) Make error messages consistent across DDLs

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14414.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Make error messages consistent across DDLs
> --
>
> Key: SPARK-14414
> URL: https://issues.apache.org/jira/browse/SPARK-14414
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> There are many different error messages right now when the user tries to run 
> something that's not supported. We might throw AnalysisException or 
> ParseException or NoSuchFunctionException etc. We should make all of these 
> consistent before 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14645) non local Python resource doesn't work with Mesos cluster mode

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14645:
--
Assignee: Timothy Chen

> non local Python resource doesn't work with Mesos cluster mode
> --
>
> Key: SPARK-14645
> URL: https://issues.apache.org/jira/browse/SPARK-14645
> Project: Spark
>  Issue Type: Bug
>Reporter: Timothy Chen
>Assignee: Timothy Chen
> Fix For: 2.0.0
>
>
> Currently SparkSubmit explicitly allows non-local python resources for 
> cluster mode with Mesos, which it's actually supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13269) Expose more executor stats in stable status API

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13269.
---
   Resolution: Fixed
 Assignee: Wenchen Fan
Fix Version/s: 2.0.0

> Expose more executor stats in stable status API
> ---
>
> Key: SPARK-13269
> URL: https://issues.apache.org/jira/browse/SPARK-13269
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Andrew Or
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>
> Currently the stable status API is quite limited; it exposes only a small 
> subset of the things exposed by JobProgressListener. It is useful for very 
> high level querying but falls short when the developer wants to build an 
> application on top of Spark with more integration.
> In this issue I propose that we expose at least two things:
> - Which executors are running tasks, and
> - Which executors cached how much in memory and on disk
> The goal is not to expose exactly these two things, but to expose something 
> that would allow the developer to learn about them. These concepts are very 
> much fundamental in Spark's design so there's almost no chance that they will 
> go away in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13269) Expose more executor stats in stable status API

2016-05-03 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270067#comment-15270067
 ] 

Andrew Or commented on SPARK-13269:
---

Oops actually this was already done in SPARK-14069. Closing this as duplicate.

> Expose more executor stats in stable status API
> ---
>
> Key: SPARK-13269
> URL: https://issues.apache.org/jira/browse/SPARK-13269
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Andrew Or
> Fix For: 2.0.0
>
>
> Currently the stable status API is quite limited; it exposes only a small 
> subset of the things exposed by JobProgressListener. It is useful for very 
> high level querying but falls short when the developer wants to build an 
> application on top of Spark with more integration.
> In this issue I propose that we expose at least two things:
> - Which executors are running tasks, and
> - Which executors cached how much in memory and on disk
> The goal is not to expose exactly these two things, but to expose something 
> that would allow the developer to learn about them. These concepts are very 
> much fundamental in Spark's design so there's almost no chance that they will 
> go away in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12299) Remove history serving functionality from standalone Master

2016-05-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-12299.
---
   Resolution: Fixed
 Assignee: Bryan Cutler
Fix Version/s: 2.0.0

> Remove history serving functionality from standalone Master
> ---
>
> Key: SPARK-12299
> URL: https://issues.apache.org/jira/browse/SPARK-12299
> Project: Spark
>  Issue Type: Sub-task
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Bryan Cutler
> Fix For: 2.0.0
>
>
> The standalone Master currently continues to serve the historical UIs of 
> applications that have completed and enabled event logging. This poses 
> problems, however, if the event log is very large, e.g. SPARK-6270. The 
> Master might OOM or hang while it rebuilds the UI, rejecting applications in 
> the mean time.
> Personally, I have had to make modifications in the code to disable this 
> myself, because I wanted to use event logging in standalone mode for 
> applications that produce a lot of logging.
> Removing this from the Master would simplify the process significantly. This 
> issue supersedes SPARK-12062.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15126) RuntimeConfig.set should return Unit rather than RuntimeConfig itself

2016-05-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15126.
---
Resolution: Fixed

> RuntimeConfig.set should return Unit rather than RuntimeConfig itself
> -
>
> Key: SPARK-15126
> URL: https://issues.apache.org/jira/browse/SPARK-15126
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>
> Currently we return RuntimeConfig itself to facilitate chaining. However, it 
> makes the output in interactive environments (e.g. notebooks, scala repl) 
> weird.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15121) Improve logging of external shuffle handler

2016-05-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15121.
---
  Resolution: Fixed
Assignee: Thomas Graves
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Improve logging of external shuffle handler
> ---
>
> Key: SPARK-15121
> URL: https://issues.apache.org/jira/browse/SPARK-15121
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 1.6.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Fix For: 2.0.0
>
>
> I want to get more information about who is connecting to the spark external 
> shuffle handler.   So I want to enhance the OpenBlocks call in 
> ExternalShuffleBlockHandler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15121) Improve logging of external shuffle handler

2016-05-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15121:
--
Priority: Minor  (was: Major)

> Improve logging of external shuffle handler
> ---
>
> Key: SPARK-15121
> URL: https://issues.apache.org/jira/browse/SPARK-15121
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 1.6.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Minor
> Fix For: 2.0.0
>
>
> I want to get more information about who is connecting to the spark external 
> shuffle handler.   So I want to enhance the OpenBlocks call in 
> ExternalShuffleBlockHandler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13001) Coarse-grained Mesos scheduler should reject offers for longer period of time when reached max cores

2016-05-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13001.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Coarse-grained Mesos scheduler should reject offers for longer period of time 
> when reached max cores
> 
>
> Key: SPARK-13001
> URL: https://issues.apache.org/jira/browse/SPARK-13001
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Scheduler
>Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0
>Reporter: Sebastien Rainville
> Fix For: 2.0.0
>
>
> Similar to https://issues.apache.org/jira/browse/SPARK-10471 but for "reached 
> max cores".
> The coarse-grained Mesos scheduler accepts every offer that match the 
> requirements until it reaches "spark.cores.max", at which point it will 
> reject every offer. Even though spark will never launch tasks on these 
> offers, mesos keeps sending the same offers again every 5 seconds making them 
> unavailable for other frameworks.
> Spark should reject those offers for a longer period of time to prevent offer 
> starvation when running a lot of frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15031) Use SparkSession in Scala/Python/Java example.

2016-05-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15031.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Use SparkSession in Scala/Python/Java example.
> --
>
> Key: SPARK-15031
> URL: https://issues.apache.org/jira/browse/SPARK-15031
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
> Fix For: 2.0.0
>
>
> This PR aims to update Scala/Python/Java examples by replacing `SQLContext` 
> with newly added `SparkSession`.
> - Use *SparkSession Builder Pattern* in 154(Scala 55, Java 52, Python 47) 
> files.
> - Add `getConf` in Python `SparkContext` class: python/pyspark/context.py
> - Replace *SQLContext Singleton Pattern* with *SparkSession Singleton 
> Pattern*:
>   - `SqlNetworkWordCount.scala`
>   - `JavaSqlNetworkWordCount.java`
>   - `sql_network_wordcount.py`
> Now, `SQLContexts` are used only in R examples and the following two Python 
> examples. The python examples are untouched in this PR since it already fails 
> some unknown issue.
> - `simple_params_example.py`
> - `aft_survival_regression.py`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13001) Coarse-grained Mesos scheduler should reject offers for longer period of time when reached max cores

2016-05-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-13001:
--
Assignee: Sebastien Rainville

> Coarse-grained Mesos scheduler should reject offers for longer period of time 
> when reached max cores
> 
>
> Key: SPARK-13001
> URL: https://issues.apache.org/jira/browse/SPARK-13001
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Scheduler
>Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0
>Reporter: Sebastien Rainville
>Assignee: Sebastien Rainville
> Fix For: 2.0.0
>
>
> Similar to https://issues.apache.org/jira/browse/SPARK-10471 but for "reached 
> max cores".
> The coarse-grained Mesos scheduler accepts every offer that match the 
> requirements until it reaches "spark.cores.max", at which point it will 
> reject every offer. Even though spark will never launch tasks on these 
> offers, mesos keeps sending the same offers again every 5 seconds making them 
> unavailable for other frameworks.
> Spark should reject those offers for a longer period of time to prevent offer 
> starvation when running a lot of frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15116) In REPL we should create SparkSession first and get SparkContext from it

2016-05-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15116.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> In REPL we should create SparkSession first and get SparkContext from it
> 
>
> Key: SPARK-15116
> URL: https://issues.apache.org/jira/browse/SPARK-15116
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15045) Remove dead code in TaskMemoryManager.cleanUpAllAllocatedMemory for pageTable

2016-05-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15045:
--
Priority: Major  (was: Trivial)

> Remove dead code in TaskMemoryManager.cleanUpAllAllocatedMemory for pageTable
> -
>
> Key: SPARK-15045
> URL: https://issues.apache.org/jira/browse/SPARK-15045
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>
> Unless my eyes trick me, {{TaskMemoryManager}} first clears up {{pageTable}}  
> in a synchronized block and right after the block it does it again. I think 
> the outside cleaning is a dead code.
> See 
> https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java#L382-L397
>  with the relevant snippet pasted below:
> {code}
>   public long cleanUpAllAllocatedMemory() {
> synchronized (this) {
>   Arrays.fill(pageTable, null);
>   ...
> }
> for (MemoryBlock page : pageTable) {
>   if (page != null) {
> memoryManager.tungstenMemoryAllocator().free(page);
>   }
> }
> Arrays.fill(pageTable, null);
>...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites

2016-05-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15037:
--
Assignee: Sandeep Singh

> Use SparkSession instead of SQLContext in testsuites
> 
>
> Key: SPARK-15037
> URL: https://issues.apache.org/jira/browse/SPARK-15037
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Dongjoon Hyun
>Assignee: Sandeep Singh
>
> This issue aims to update the existing testsuites to use `SparkSession` 
> instread of `SQLContext` since `SQLContext` exists just for backward 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14896) Deprecate HiveContext in Python

2016-05-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14896.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Deprecate HiveContext in Python
> ---
>
> Key: SPARK-14896
> URL: https://issues.apache.org/jira/browse/SPARK-14896
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15072) Remove SparkSession.withHiveSupport

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15072.
---
Resolution: Fixed

> Remove SparkSession.withHiveSupport
> ---
>
> Key: SPARK-15072
> URL: https://issues.apache.org/jira/browse/SPARK-15072
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Sandeep Singh
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15135) Make sure SparkSession thread safe

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15135.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Make sure SparkSession thread safe
> --
>
> Key: SPARK-15135
> URL: https://issues.apache.org/jira/browse/SPARK-15135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
>
> Fixed non-thread-safe classed used by SparkSession.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15134) Indent SparkSession builder patterns and update binary_classification_metrics_example.py

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15134.
---
  Resolution: Fixed
Assignee: Dongjoon Hyun
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Indent SparkSession builder patterns and update 
> binary_classification_metrics_example.py
> 
>
> Key: SPARK-15134
> URL: https://issues.apache.org/jira/browse/SPARK-15134
> Project: Spark
>  Issue Type: Task
>  Components: Examples
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.0.0
>
>
> This issue addresses the comments in SPARK-15031 and also fix java-linter 
> errors.
> - Use multiline format in SparkSession builder patterns.
> - Update `binary_classification_metrics_example.py` to use `SparkSession`.
> - Fix Java Linter errors (in SPARK-13745, SPARK-15031, and so far)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15158) Too aggressive logging in SizeBasedRollingPolicy?

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15158.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Too aggressive logging in SizeBasedRollingPolicy?
> -
>
> Key: SPARK-15158
> URL: https://issues.apache.org/jira/browse/SPARK-15158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Kai Wang
>Priority: Trivial
> Fix For: 2.0.0
>
>
> The questionable line is this: 
> https://github.com/apache/spark/blob/3e27940a19e7bab448f1af11d2065ecd1ec66197/core/src/main/scala/org/apache/spark/util/logging/RollingPolicy.scala#L116
> This will output a message *whenever* anything is logged at executor level. 
> Like the following:
> SizeBasedRollingPolicy:59 83 + 140796 > 1048576
> SizeBasedRollingPolicy:59 83 + 140879 > 1048576
> SizeBasedRollingPolicy:59 83 + 140962 > 1048576
> ...
> This seems to aggressive. Should this be at least downgrade to debug level?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9926) Parallelize file listing for partitioned Hive table

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-9926.
--
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Parallelize file listing for partitioned Hive table
> ---
>
> Key: SPARK-9926
> URL: https://issues.apache.org/jira/browse/SPARK-9926
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Cheolsoo Park
>Assignee: Ryan Blue
> Fix For: 2.0.0
>
>
> In Spark SQL, short queries like {{select * from table limit 10}} run very 
> slowly against partitioned Hive tables because of file listing. In 
> particular, if a large number of partitions are scanned on storage like S3, 
> the queries run extremely slowly. Here are some example benchmarks in my 
> environment-
> * Parquet-backed Hive table
> * Partitioned by dateint and hour
> * Stored on S3
> ||\# of partitions||\# of files||runtime||query||
> |1|972|30 secs|select * from nccp_log where dateint=20150601 and hour=0 limit 
> 10;|
> |24|13646|6 mins|select * from nccp_log where dateint=20150601 limit 10;|
> |240|136222|1 hour|select * from nccp_log where dateint>=20150601 and 
> dateint<=20150610 limit 10;|
> The problem is that {{TableReader}} constructs a separate HadoopRDD per Hive 
> partition path and group them into a UnionRDD. Then, all the input files are 
> listed sequentially. In other tools such as Hive and Pig, this can be solved 
> by setting 
> [mapreduce.input.fileinputformat.list-status.num-threads|https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml]
>  high. But in Spark, since each HadoopRDD lists only one partition path, 
> setting this property doesn't help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15158) Too aggressive logging in SizeBasedRollingPolicy?

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15158:
--
Assignee: Kai Wang

> Too aggressive logging in SizeBasedRollingPolicy?
> -
>
> Key: SPARK-15158
> URL: https://issues.apache.org/jira/browse/SPARK-15158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Kai Wang
>Assignee: Kai Wang
>Priority: Trivial
> Fix For: 2.0.0
>
>
> The questionable line is this: 
> https://github.com/apache/spark/blob/3e27940a19e7bab448f1af11d2065ecd1ec66197/core/src/main/scala/org/apache/spark/util/logging/RollingPolicy.scala#L116
> This will output a message *whenever* anything is logged at executor level. 
> Like the following:
> SizeBasedRollingPolicy:59 83 + 140796 > 1048576
> SizeBasedRollingPolicy:59 83 + 140879 > 1048576
> SizeBasedRollingPolicy:59 83 + 140962 > 1048576
> ...
> This seems to aggressive. Should this be at least downgrade to debug level?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14893) Re-enable HiveSparkSubmitSuite SPARK-8489 test after HiveContext is removed

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14893.
---
   Resolution: Fixed
 Assignee: Dilip Biswal
Fix Version/s: 2.0.0

> Re-enable HiveSparkSubmitSuite SPARK-8489 test after HiveContext is removed
> ---
>
> Key: SPARK-14893
> URL: https://issues.apache.org/jira/browse/SPARK-14893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Dilip Biswal
> Fix For: 2.0.0
>
>
> The test was disabled in https://github.com/apache/spark/pull/12585. To 
> re-enable it we need to rebuild the jar using the updated source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15166) Move hive-specific conf setting to HiveSharedState

2016-05-05 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15166:
-

 Summary: Move hive-specific conf setting to HiveSharedState
 Key: SPARK-15166
 URL: https://issues.apache.org/jira/browse/SPARK-15166
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15166) Move hive-specific conf setting from SparkSession

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15166:
--
Summary: Move hive-specific conf setting from SparkSession  (was: Move 
hive-specific conf setting to HiveSharedState)

> Move hive-specific conf setting from SparkSession
> -
>
> Key: SPARK-15166
> URL: https://issues.apache.org/jira/browse/SPARK-15166
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15167) Add public catalog implementation method to SparkSession

2016-05-05 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15167:
-

 Summary: Add public catalog implementation method to SparkSession
 Key: SPARK-15167
 URL: https://issues.apache.org/jira/browse/SPARK-15167
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


Right now there's no way to check whether a given SparkSession has Hive 
support. You can do `spark.conf.get("spark.sql.catalogImplementation")` but 
that's supposed to be hidden from the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15152) Scaladoc and Code style Improvements

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15152.
---
  Resolution: Fixed
Assignee: Jacek Laskowski
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Scaladoc and Code style Improvements
> 
>
> Key: SPARK-15152
> URL: https://issues.apache.org/jira/browse/SPARK-15152
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML, Spark Core, SQL, YARN
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Assignee: Jacek Laskowski
>Priority: Minor
> Fix For: 2.0.0
>
>
> While doing code reviews for the Spark Notes I found many places with typos 
> and incorrect code style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13566) Deadlock between MemoryStore and BlockManager

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-13566:
--
Assignee: cen yuhai

> Deadlock between MemoryStore and BlockManager
> -
>
> Key: SPARK-13566
> URL: https://issues.apache.org/jira/browse/SPARK-13566
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 1.6.0
> Environment: Spark 1.6.0 hadoop2.2.0 jdk1.8.0_65 centOs 6.2
>Reporter: cen yuhai
>Assignee: cen yuhai
>
> ===
> "block-manager-slave-async-thread-pool-1":
> at org.apache.spark.storage.MemoryStore.remove(MemoryStore.scala:216)
> - waiting to lock <0x0005895b09b0> (a 
> org.apache.spark.memory.UnifiedMemoryManager)
> at 
> org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1114)
> - locked <0x00058ed6aae0> (a org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at scala.collection.immutable.Set$Set2.foreach(Set.scala:94)
> at 
> org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:84)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> "Executor task launch worker-10":
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1032)
> - waiting to lock <0x00059a0988b8> (a 
> org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1009)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:460)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:449)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15167) Add public catalog implementation method to SparkSession

2016-05-06 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15167.
---
Resolution: Won't Fix

> Add public catalog implementation method to SparkSession
> 
>
> Key: SPARK-15167
> URL: https://issues.apache.org/jira/browse/SPARK-15167
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> Right now there's no way to check whether a given SparkSession has Hive 
> support. You can do `spark.conf.get("spark.sql.catalogImplementation")` but 
> that's supposed to be hidden from the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13566) Deadlock between MemoryStore and BlockManager

2016-05-06 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13566.
---
   Resolution: Fixed
Fix Version/s: 1.6.2

> Deadlock between MemoryStore and BlockManager
> -
>
> Key: SPARK-13566
> URL: https://issues.apache.org/jira/browse/SPARK-13566
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 1.6.0
> Environment: Spark 1.6.0 hadoop2.2.0 jdk1.8.0_65 centOs 6.2
>Reporter: cen yuhai
>Assignee: cen yuhai
> Fix For: 1.6.2
>
>
> ===
> "block-manager-slave-async-thread-pool-1":
> at org.apache.spark.storage.MemoryStore.remove(MemoryStore.scala:216)
> - waiting to lock <0x0005895b09b0> (a 
> org.apache.spark.memory.UnifiedMemoryManager)
> at 
> org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1114)
> - locked <0x00058ed6aae0> (a org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at scala.collection.immutable.Set$Set2.foreach(Set.scala:94)
> at 
> org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:84)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> "Executor task launch worker-10":
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1032)
> - waiting to lock <0x00059a0988b8> (a 
> org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1009)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:460)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:449)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15093) create/delete/rename directory for InMemoryCatalog operations if needed

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15093.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> create/delete/rename directory for InMemoryCatalog operations if needed
> ---
>
> Key: SPARK-15093
> URL: https://issues.apache.org/jira/browse/SPARK-15093
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15223.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Priority: Trivial
> Fix For: 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15223:
--
Priority: Minor  (was: Trivial)

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Priority: Minor
> Fix For: 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15223:
--
Fix Version/s: 1.6.2

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Assignee: Philipp Hoffmann
>Priority: Minor
> Fix For: 1.6.2, 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15223:
--
Target Version/s: 1.6.2, 2.0.0  (was: 2.0.0)

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Assignee: Philipp Hoffmann
>Priority: Minor
> Fix For: 1.6.2, 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15223:
--
Assignee: Philipp Hoffmann

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Assignee: Philipp Hoffmann
>Priority: Minor
> Fix For: 1.6.2, 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15225) Replace SQLContext with SparkSession in Encoder documentation

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15225.
---
  Resolution: Fixed
Assignee: Liang-Chi Hsieh
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Replace SQLContext with SparkSession in Encoder documentation
> -
>
> Key: SPARK-15225
> URL: https://issues.apache.org/jira/browse/SPARK-15225
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Minor
> Fix For: 2.0.0
>
>
> Encoder's doc mentions sqlContext.implicits._. We should use 
> sparkSession.implicits._ instead now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15067) YARN executors are launched with fixed perm gen size

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15067.
---
  Resolution: Fixed
Assignee: Sean Owen
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> YARN executors are launched with fixed perm gen size
> 
>
> Key: SPARK-15067
> URL: https://issues.apache.org/jira/browse/SPARK-15067
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Renato Falchi Brandão
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 2.0.0
>
>
> It is impossible to change the executors max perm gen size using the property 
> "spark.executor.extraJavaOptions" when you are running on YARN.
> When the JVM option "-XX:MaxPermSize" is set through the property 
> "spark.executor.extraJavaOptions", Spark put it properly in the shell command 
> that will start the JVM container but, in the ending of command, it sets 
> again this option using a fixed value of 256m, as you can see in the log I've 
> extracted:
> 2016-04-30 17:20:12 INFO  ExecutorRunnable:58 -
> ===
> YARN executor launch context:
>   env:
> CLASSPATH -> 
> {{PWD}}{{PWD}}/__spark__.jar$HADOOP_CONF_DIR/usr/hdp/current/hadoop-client/*/usr/hdp/current/hadoop-client/lib/*/usr/hdp/current/hadoop-hdfs-client/*/usr/hdp/current/hadoop-hdfs-client/lib/*/usr/hdp/current/hadoop-yarn-client/*/usr/hdp/current/hadoop-yarn-client/lib/*/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/*:/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/current/hadoop/lib/hadoop-lzo-0.6.0.jar:/etc/hadoop/conf/secure
> SPARK_LOG_URL_STDERR -> 
> http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stderr?start=-4096
> SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1456962126505_329993
> SPARK_YARN_CACHE_FILES_FILE_SIZES -> 191719054,166
> SPARK_USER -> h_loadbd
> SPARK_YARN_CACHE_FILES_VISIBILITIES -> PUBLIC,PUBLIC
> SPARK_YARN_MODE -> true
> SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1459806496093,1459808508343
> SPARK_LOG_URL_STDOUT -> 
> http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stdout?start=-4096
> SPARK_YARN_CACHE_FILES -> 
> hdfs://x/user/datalab/hdp/spark/lib/spark-assembly-1.6.0.2.3.4.1-10-hadoop2.7.1.2.3.4.1-10.jar#__spark__.jar,hdfs://tlvcluster/user/datalab/hdp/spark/conf/hive-site.xml#hive-site.xml
>   command:
> {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms6144m 
> -Xmx6144m '-XX:+PrintGCDetails' '-XX:MaxPermSize=1024M' 
> '-XX:+PrintGCTimeStamps' -Djava.io.tmpdir={{PWD}}/tmp 
> '-Dspark.akka.timeout=30' '-Dspark.driver.port=62875' 
> '-Dspark.rpc.askTimeout=30' '-Dspark.rpc.lookupTimeout=30' 
> -Dspark.yarn.app.container.log.dir= -XX:MaxPermSize=256m 
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url 
> spark://CoarseGrainedScheduler@10.125.81.42:62875 --executor-id 1 --hostname 
> x0668sl.x.br --cores 1 --app-id application_1456962126505_329993 
> --user-class-path file:$PWD/__app__.jar 1> /stdout 2> 
> /stderr
> Analyzing the code is possible to see that all the options set in the 
> property "spark.executor.extraJavaOptions" are enclosed, one by one, in 
> single quotes (ExecutorRunnable.scala:151) before the launcher take the 
> decision if a default value has to be provided or not for the option 
> "-XX:MaxPermSize" (ExecutorRunnable.scala:202).
> This decision is taken examining all the options set and looking for a string 
> starting with the value "-XX:MaxPermSize" (CommandBuilderUtils.java:328). If 
> that value is not found, the default value is set.
> A string option starting without single quote will never be found, then, a 
> default value will always be provided.
> A possible solution is change the source code of CommandBuilderUtils.java in 
> the line 328:
> From-> if (arg.startsWith("-XX:MaxPermSize="))
> To-> if (arg.indexOf("-XX:MaxPermSize=") > -1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15220) Add hyperlink to "running application" and "completed application"

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15220.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Add hyperlink to "running application" and "completed application"
> --
>
> Key: SPARK-15220
> URL: https://issues.apache.org/jira/browse/SPARK-15220
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Mao, Wei
>Priority: Minor
> Fix For: 2.0.0
>
>
> Add hyperlink to "running application" and "completed application", so user 
> can jump to application table directly, In my environment, I set up 1000+ 
> works and it's painful to scroll down to skip worker list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15210) Add missing @DeveloperApi annotation in sql.types

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15210:
--
Assignee: zhengruifeng

> Add missing @DeveloperApi annotation in sql.types
> -
>
> Key: SPARK-15210
> URL: https://issues.apache.org/jira/browse/SPARK-15210
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Trivial
> Fix For: 2.0.0
>
>
> @DeveloperApi annotation for {{AbstractDataType}} {{MapType}} 
> {{UserDefinedType}} are missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15166) Move hive-specific conf setting from SparkSession

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15166.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Move hive-specific conf setting from SparkSession
> -
>
> Key: SPARK-15166
> URL: https://issues.apache.org/jira/browse/SPARK-15166
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15210) Add missing @DeveloperApi annotation in sql.types

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15210.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Add missing @DeveloperApi annotation in sql.types
> -
>
> Key: SPARK-15210
> URL: https://issues.apache.org/jira/browse/SPARK-15210
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Trivial
> Fix For: 2.0.0
>
>
> @DeveloperApi annotation for {{AbstractDataType}} {{MapType}} 
> {{UserDefinedType}} are missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10653) Remove unnecessary things from SparkEnv

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-10653.
---
  Resolution: Fixed
Assignee: Alex Bozarth
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Remove unnecessary things from SparkEnv
> ---
>
> Key: SPARK-10653
> URL: https://issues.apache.org/jira/browse/SPARK-10653
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Alex Bozarth
> Fix For: 2.0.0
>
>
> As of the writing of this message, there are at least two things that can be 
> removed from it:
> {code}
> @DeveloperApi
> class SparkEnv (
> val executorId: String,
> private[spark] val rpcEnv: RpcEnv,
> val serializer: Serializer,
> val closureSerializer: Serializer,
> val cacheManager: CacheManager,
> val mapOutputTracker: MapOutputTracker,
> val shuffleManager: ShuffleManager,
> val broadcastManager: BroadcastManager,
> val blockTransferService: BlockTransferService, // this one can go
> val blockManager: BlockManager,
> val securityManager: SecurityManager,
> val httpFileServer: HttpFileServer,
> val sparkFilesDir: String, // this one maybe? It's only used in 1 place.
> val metricsSystem: MetricsSystem,
> val shuffleMemoryManager: ShuffleMemoryManager,
> val executorMemoryManager: ExecutorMemoryManager, // this can go
> val outputCommitCoordinator: OutputCommitCoordinator,
> val conf: SparkConf) extends Logging {
>   ...
> }
> {code}
> We should avoid adding to this infinite list of things in SparkEnv's 
> constructors if they're not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14021) Support custom context derived from HiveContext for SparkSQLEnv

2016-05-09 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276820#comment-15276820
 ] 

Andrew Or commented on SPARK-14021:
---

Closing as Won't Fix because the issue is outdated after HiveContext was 
removed.

> Support custom context derived from HiveContext for SparkSQLEnv
> ---
>
> Key: SPARK-14021
> URL: https://issues.apache.org/jira/browse/SPARK-14021
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Adrian Wang
>
> This is to create a custom context for command bin/spark-sql and 
> sbin/start-thriftserver. Any context that is derived from HiveContext is 
> acceptable. User need to configure the class name of custom context in a 
> config of spark.sql.context.class, and make sure the class in classpath. This 
> is to provide a more elegant way for custom configurations and changes for 
> infrastructure team.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14021) Support custom context derived from HiveContext for SparkSQLEnv

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14021.
---
Resolution: Won't Fix

> Support custom context derived from HiveContext for SparkSQLEnv
> ---
>
> Key: SPARK-14021
> URL: https://issues.apache.org/jira/browse/SPARK-14021
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Adrian Wang
>
> This is to create a custom context for command bin/spark-sql and 
> sbin/start-thriftserver. Any context that is derived from HiveContext is 
> acceptable. User need to configure the class name of custom context in a 
> config of spark.sql.context.class, and make sure the class in classpath. This 
> is to provide a more elegant way for custom configurations and changes for 
> infrastructure team.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15234) spark.catalog.listDatabases.show() is not formatted correctly

2016-05-09 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15234:
-

 Summary: spark.catalog.listDatabases.show() is not formatted 
correctly
 Key: SPARK-15234
 URL: https://issues.apache.org/jira/browse/SPARK-15234
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or


{code}
scala> spark.catalog.listDatabases.show()
++---+---+
|name|description|locationUri|
++---+---+
|Database[name='de...|
|Database[name='my...|
|Database[name='so...|
++---+---+
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15234) spark.catalog.listDatabases.show() is not formatted correctly

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15234:
--
Description: 
{code}
scala> spark.catalog.listDatabases.show()
++---+---+
|name|description|locationUri|
++---+---+
|Database[name='de...|
|Database[name='my...|
|Database[name='so...|
++---+---+
{code}

It's because org.apache.spark.sql.catalog.Database is not a case class!

  was:
{code}
scala> spark.catalog.listDatabases.show()
++---+---+
|name|description|locationUri|
++---+---+
|Database[name='de...|
|Database[name='my...|
|Database[name='so...|
++---+---+
{code}


> spark.catalog.listDatabases.show() is not formatted correctly
> -
>
> Key: SPARK-15234
> URL: https://issues.apache.org/jira/browse/SPARK-15234
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>
> {code}
> scala> spark.catalog.listDatabases.show()
> ++---+---+
> |name|description|locationUri|
> ++---+---+
> |Database[name='de...|
> |Database[name='my...|
> |Database[name='so...|
> ++---+---+
> {code}
> It's because org.apache.spark.sql.catalog.Database is not a case class!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15234) spark.catalog.listDatabases.show() is not formatted correctly

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-15234:
-

Assignee: Andrew Or

> spark.catalog.listDatabases.show() is not formatted correctly
> -
>
> Key: SPARK-15234
> URL: https://issues.apache.org/jira/browse/SPARK-15234
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> {code}
> scala> spark.catalog.listDatabases.show()
> ++---+---+
> |name|description|locationUri|
> ++---+---+
> |Database[name='de...|
> |Database[name='my...|
> |Database[name='so...|
> ++---+---+
> {code}
> It's because org.apache.spark.sql.catalog.Database is not a case class!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15236) No way to disable Hive support in REPL

2016-05-09 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15236:
-

 Summary: No way to disable Hive support in REPL
 Key: SPARK-15236
 URL: https://issues.apache.org/jira/browse/SPARK-15236
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


If you built Spark with Hive classes, there's no switch to flip to start a new 
`spark-shell` using the InMemoryCatalog. The only thing you can do now is to 
rebuild Spark again. That is quite inconvenient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15236) No way to disable Hive support in REPL

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15236:
--
Assignee: (was: Andrew Or)

> No way to disable Hive support in REPL
> --
>
> Key: SPARK-15236
> URL: https://issues.apache.org/jira/browse/SPARK-15236
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>
> If you built Spark with Hive classes, there's no switch to flip to start a 
> new `spark-shell` using the InMemoryCatalog. The only thing you can do now is 
> to rebuild Spark again. That is quite inconvenient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15236) No way to disable Hive support in REPL

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15236:
--
Component/s: Spark Shell

> No way to disable Hive support in REPL
> --
>
> Key: SPARK-15236
> URL: https://issues.apache.org/jira/browse/SPARK-15236
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>
> If you built Spark with Hive classes, there's no switch to flip to start a 
> new `spark-shell` using the InMemoryCatalog. The only thing you can do now is 
> to rebuild Spark again. That is quite inconvenient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15037.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Use SparkSession instead of SQLContext in testsuites
> 
>
> Key: SPARK-15037
> URL: https://issues.apache.org/jira/browse/SPARK-15037
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Reporter: Dongjoon Hyun
>Assignee: Sandeep Singh
> Fix For: 2.0.0
>
>
> This issue aims to update the existing testsuites to use `SparkSession` 
> instread of `SQLContext` since `SQLContext` exists just for backward 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15037:
--
Component/s: Tests

> Use SparkSession instead of SQLContext in testsuites
> 
>
> Key: SPARK-15037
> URL: https://issues.apache.org/jira/browse/SPARK-15037
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Reporter: Dongjoon Hyun
>Assignee: Sandeep Singh
> Fix For: 2.0.0
>
>
> This issue aims to update the existing testsuites to use `SparkSession` 
> instread of `SQLContext` since `SQLContext` exists just for backward 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15037:
--
Component/s: SQL

> Use SparkSession instead of SQLContext in testsuites
> 
>
> Key: SPARK-15037
> URL: https://issues.apache.org/jira/browse/SPARK-15037
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Reporter: Dongjoon Hyun
>Assignee: Sandeep Singh
> Fix For: 2.0.0
>
>
> This issue aims to update the existing testsuites to use `SparkSession` 
> instread of `SQLContext` since `SQLContext` exists just for backward 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14684) Verification of partition specs in SessionCatalog

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14684:
--
Assignee: Xiao Li

> Verification of partition specs in SessionCatalog
> -
>
> Key: SPARK-14684
> URL: https://issues.apache.org/jira/browse/SPARK-14684
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> When users inputting invalid partition spec, we might not be able to catch 
> and issue the error messages. Sometimes, it could cause a disaster result. 
> For example, previously, when we alter a table and drop a partition with 
> invalid spec, it could drop all the partitions due to a bug/defect in Hive 
> Metastore API. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14603) SessionCatalog needs to check if a metadata operation is valid

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14603.
---
   Resolution: Fixed
 Assignee: Xiao Li
Fix Version/s: 2.0.0

> SessionCatalog needs to check if a metadata operation is valid
> --
>
> Key: SPARK-14603
> URL: https://issues.apache.org/jira/browse/SPARK-14603
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Xiao Li
>Priority: Critical
> Fix For: 2.0.0
>
>
> Since we cannot really trust if the underlying external catalog can throw 
> exceptions when there is an invalid metadata operation, let's do it in 
> SessionCatalog. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14857) Table/Database Name Validation in SessionCatalog

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14857:
--
Assignee: Xiao Li

> Table/Database Name Validation in SessionCatalog
> 
>
> Key: SPARK-14857
> URL: https://issues.apache.org/jira/browse/SPARK-14857
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> We need validate the database/table names before storing these information in 
> `ExternalCatalog`. 
> For example, if users use `backstick` to quote the table/database names 
> containing illegal characters, these names are allowed by Spark Parser, but 
> Hive metastore does not allow them. We need to catch them in SessionCatalog 
> and issue an appropriate error message.
> ```
> CREATE TABLE `tab:1`  ...
> ```
> This PR enforces the name rules of Spark SQL for `table`/`database`/`view`: 
> `only can contain alphanumeric and underscore characters.` Different from 
> Hive, we allow the names with starting underscore characters. 
> The validation of function/column names will be done in a separate JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15257) Require CREATE EXTERNAL TABLE to specify LOCATION

2016-05-10 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15257:
-

 Summary: Require CREATE EXTERNAL TABLE to specify LOCATION
 Key: SPARK-15257
 URL: https://issues.apache.org/jira/browse/SPARK-15257
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


Right now when the user runs `CREATE EXTERNAL TABLE` without specifying 
`LOCATION`, the table will still be created in the warehouse directory, but its 
metadata won't be deleted even when the user drops the table! This is a 
problem. We should use require the user to also specify `LOCATION`.

Note: This doesn't not apply to `CREATE EXTERNAL TABLE ... USING`, which is not 
yet supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15249) Use FunctionResource instead of (String, String) in CreateFunction and CatalogFunction for resource

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15249.
---
  Resolution: Fixed
Assignee: Sandeep Singh
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Use FunctionResource instead of (String, String) in CreateFunction and 
> CatalogFunction for resource
> ---
>
> Key: SPARK-15249
> URL: https://issues.apache.org/jira/browse/SPARK-15249
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sandeep Singh
>Assignee: Sandeep Singh
>Priority: Minor
> Fix For: 2.0.0
>
>
> Use FunctionResource instead of (String, String) in CreateFunction and 
> CatalogFunction for resource
> see: TODO's here
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L36
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala#L42



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15262) race condition in killing an executor and reregistering an executor

2016-05-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15262:
--
Target Version/s: 1.6.2, 2.0.0

> race condition in killing an executor and reregistering an executor
> ---
>
> Key: SPARK-15262
> URL: https://issues.apache.org/jira/browse/SPARK-15262
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Shixiong Zhu
>
> There is a race condition when killing an executor and reregistering an 
> executor happen at the same time. Here is the execution steps to reproduce it.
> 1. master find a worker is dead
> 2. master tells driver to remove executor
> 3. driver remove executor
> 4. BlockManagerMasterEndpoint remove the block manager
> 5. executor finds it's not reigstered via heartbeat
> 6. executor send reregister block manager
> 7. register block manager
> 8. executor is killed by worker
> 9. CoarseGrainedSchedulerBackend ignores onDisconnected as this address is 
> not in the executor list
> 10. BlockManagerMasterEndpoint.blockManagerInfo contains dead block managers
> As BlockManagerMasterEndpoint.blockManagerInfo contains some dead block 
> managers, when we unpersist a RDD, remove a broadcast, or clean a shuffle 
> block via a RPC endpoint of a dead block manager, we will get 
> ClosedChannelException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13566) Deadlock between MemoryStore and BlockManager

2016-05-11 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280615#comment-15280615
 ] 

Andrew Or commented on SPARK-13566:
---

[~ekeddy] This only happens with the unified memory manager, so you could 
switch back to the static memory manager by setting 
`spark.memory.useLegacyMode` to true. You may observe a decrease in performance 
if you do that, however.

> Deadlock between MemoryStore and BlockManager
> -
>
> Key: SPARK-13566
> URL: https://issues.apache.org/jira/browse/SPARK-13566
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 1.6.0
> Environment: Spark 1.6.0 hadoop2.2.0 jdk1.8.0_65 centOs 6.2
>Reporter: cen yuhai
>Assignee: cen yuhai
> Fix For: 1.6.2
>
>
> ===
> "block-manager-slave-async-thread-pool-1":
> at org.apache.spark.storage.MemoryStore.remove(MemoryStore.scala:216)
> - waiting to lock <0x0005895b09b0> (a 
> org.apache.spark.memory.UnifiedMemoryManager)
> at 
> org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1114)
> - locked <0x00058ed6aae0> (a org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at scala.collection.immutable.Set$Set2.foreach(Set.scala:94)
> at 
> org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:84)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> "Executor task launch worker-10":
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1032)
> - waiting to lock <0x00059a0988b8> (a 
> org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1009)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:460)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:449)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >