[jira] [Updated] (SPARK-18361) Expose RDD localCheckpoint in PySpark
[ https://issues.apache.org/jira/browse/SPARK-18361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-18361: -- Assignee: Gabriel Huang > Expose RDD localCheckpoint in PySpark > - > > Key: SPARK-18361 > URL: https://issues.apache.org/jira/browse/SPARK-18361 > Project: Spark > Issue Type: New Feature > Components: PySpark >Reporter: Gabriel Huang >Assignee: Gabriel Huang > Original Estimate: 336h > Remaining Estimate: 336h > > As of today, I could not access rdd.localCheckpoint() in pyspark. > This is an important issue for machine learning people, as we often have to > iterate algorithms and perform operations like joins in each iteration. > If the lineage is not truncated, the memory usage, the lineage, and > computation time explode. rdd.localCheckpoint() seems like the most > straightforward way of truncating the lineage, but the python API does not > expose it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18361) Expose RDD localCheckpoint in PySpark
[ https://issues.apache.org/jira/browse/SPARK-18361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-18361. --- Resolution: Fixed Fix Version/s: 2.1.0 Target Version/s: 2.1.0 > Expose RDD localCheckpoint in PySpark > - > > Key: SPARK-18361 > URL: https://issues.apache.org/jira/browse/SPARK-18361 > Project: Spark > Issue Type: New Feature > Components: PySpark >Reporter: Gabriel Huang >Assignee: Gabriel Huang > Fix For: 2.1.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > As of today, I could not access rdd.localCheckpoint() in pyspark. > This is an important issue for machine learning people, as we often have to > iterate algorithms and perform operations like joins in each iteration. > If the lineage is not truncated, the memory usage, the lineage, and > computation time explode. rdd.localCheckpoint() seems like the most > straightforward way of truncating the lineage, but the python API does not > expose it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18517) DROP TABLE IF EXISTS should not warn for non-existing tables
[ https://issues.apache.org/jira/browse/SPARK-18517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-18517. --- Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 2.1.0 Target Version/s: 2.1.0 > DROP TABLE IF EXISTS should not warn for non-existing tables > > > Key: SPARK-18517 > URL: https://issues.apache.org/jira/browse/SPARK-18517 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.1.0 > > > Currently, `DROP TABLE IF EXISTS` shows warning for non-existing tables. > However, it had better be quiet for this case by definition of the command. > {code} > scala> sql("DROP TABLE IF EXISTS nonexist") > 16/11/20 20:48:26 WARN DropTableCommand: > org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view > 'nonexist' not found in database 'default'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18507) Major performance regression in SHOW PARTITIONS on partitioned Hive tables
[ https://issues.apache.org/jira/browse/SPARK-18507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-18507. --- Resolution: Fixed Assignee: Wenchen Fan Fix Version/s: 2.1.0 Target Version/s: 2.1.0 > Major performance regression in SHOW PARTITIONS on partitioned Hive tables > -- > > Key: SPARK-18507 > URL: https://issues.apache.org/jira/browse/SPARK-18507 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Michael Allman >Assignee: Wenchen Fan >Priority: Critical > Fix For: 2.1.0 > > > Commit {{ccb11543048dccd4cc590a8db1df1d9d5847d112}} > (https://github.com/apache/spark/commit/ccb11543048dccd4cc590a8db1df1d9d5847d112) > appears to have introduced a major regression in the performance of the Hive > {{SHOW PARTITIONS}} command. Running that command on a Hive table with 17,337 > partitions in the {{spark-sql}} shell with the parent commit of {{ccb1154}} > takes approximately 7.3 seconds. Running the same command with commit > {{ccb1154}} takes approximately 250 seconds. > I have not had the opportunity to complete a thorough investigation, but I > suspect the problem lies in the diff hunk beginning at > https://github.com/apache/spark/commit/ccb11543048dccd4cc590a8db1df1d9d5847d112#diff-159191585e10542f013cb3a714f26075L675. > If that's the case, this performance issue should manifest itself in other > areas as this programming pattern was used elsewhere in this commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-18050) spark 2.0.1 enable hive throw AlreadyExistsException(message:Database default already exists)
[ https://issues.apache.org/jira/browse/SPARK-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reopened SPARK-18050: --- Assignee: Wenchen Fan > spark 2.0.1 enable hive throw AlreadyExistsException(message:Database default > already exists) > - > > Key: SPARK-18050 > URL: https://issues.apache.org/jira/browse/SPARK-18050 > Project: Spark > Issue Type: Bug > Components: SQL > Environment: jdk1.8, macOs,spark 2.0.1 >Reporter: todd.chen >Assignee: Wenchen Fan > > in spark 2.0.1 ,I enable hive support and when init the sqlContext ,throw a > AlreadyExistsException(message:Database default already exists),same as > https://www.mail-archive.com/dev@spark.apache.org/msg15306.html ,my code is > {code} > private val master = "local[*]" > private val appName = "xqlServerSpark" > val fileSystem = FileSystem.get() > val sparkConf = new SparkConf().setMaster(master). > setAppName(appName).set("spark.sql.warehouse.dir", > s"${fileSystem.getUri.toASCIIString}/user/hive/warehouse") > val hiveContext = > SparkSession.builder().config(sparkConf).enableHiveSupport().getOrCreate().sqlContext > print(sparkConf.get("spark.sql.warehouse.dir")) > hiveContext.sql("show tables").show() > {code} > the result is correct,but a exception also throwBy the code -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18050) spark 2.0.1 enable hive throw AlreadyExistsException(message:Database default already exists)
[ https://issues.apache.org/jira/browse/SPARK-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-18050. --- Resolution: Fixed Fix Version/s: 2.1.0 Target Version/s: 2.1.0 > spark 2.0.1 enable hive throw AlreadyExistsException(message:Database default > already exists) > - > > Key: SPARK-18050 > URL: https://issues.apache.org/jira/browse/SPARK-18050 > Project: Spark > Issue Type: Bug > Components: SQL > Environment: jdk1.8, macOs,spark 2.0.1 >Reporter: todd.chen >Assignee: Wenchen Fan > Fix For: 2.1.0 > > > in spark 2.0.1 ,I enable hive support and when init the sqlContext ,throw a > AlreadyExistsException(message:Database default already exists),same as > https://www.mail-archive.com/dev@spark.apache.org/msg15306.html ,my code is > {code} > private val master = "local[*]" > private val appName = "xqlServerSpark" > val fileSystem = FileSystem.get() > val sparkConf = new SparkConf().setMaster(master). > setAppName(appName).set("spark.sql.warehouse.dir", > s"${fileSystem.getUri.toASCIIString}/user/hive/warehouse") > val hiveContext = > SparkSession.builder().config(sparkConf).enableHiveSupport().getOrCreate().sqlContext > print(sparkConf.get("spark.sql.warehouse.dir")) > hiveContext.sql("show tables").show() > {code} > the result is correct,but a exception also throwBy the code -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14893) Re-enable HiveSparkSubmitSuite SPARK-8489 test after HiveContext is removed
Andrew Or created SPARK-14893: - Summary: Re-enable HiveSparkSubmitSuite SPARK-8489 test after HiveContext is removed Key: SPARK-14893 URL: https://issues.apache.org/jira/browse/SPARK-14893 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 2.0.0 Reporter: Andrew Or The test was disabled in https://github.com/apache/spark/pull/12585. To re-enable it we need to rebuild the jar using the updated source code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14895) SparkSession Python API
Andrew Or created SPARK-14895: - Summary: SparkSession Python API Key: SPARK-14895 URL: https://issues.apache.org/jira/browse/SPARK-14895 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14896) Remove HiveContext in Python
Andrew Or created SPARK-14896: - Summary: Remove HiveContext in Python Key: SPARK-14896 URL: https://issues.apache.org/jira/browse/SPARK-14896 Project: Spark Issue Type: Sub-task Components: PySpark, SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14721) Remove the HiveContext class
[ https://issues.apache.org/jira/browse/SPARK-14721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14721. --- Resolution: Fixed Fix Version/s: 2.0.0 > Remove the HiveContext class > > > Key: SPARK-14721 > URL: https://issues.apache.org/jira/browse/SPARK-14721 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14902) Expose user-facing RuntimeConfig in SparkSession
Andrew Or created SPARK-14902: - Summary: Expose user-facing RuntimeConfig in SparkSession Key: SPARK-14902 URL: https://issues.apache.org/jira/browse/SPARK-14902 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14904) Add back HiveContext in compatibility package
Andrew Or created SPARK-14904: - Summary: Add back HiveContext in compatibility package Key: SPARK-14904 URL: https://issues.apache.org/jira/browse/SPARK-14904 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14940) Move ExternalCatalog to own file
Andrew Or created SPARK-14940: - Summary: Move ExternalCatalog to own file Key: SPARK-14940 URL: https://issues.apache.org/jira/browse/SPARK-14940 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14945) Python SparkSession API
Andrew Or created SPARK-14945: - Summary: Python SparkSession API Key: SPARK-14945 URL: https://issues.apache.org/jira/browse/SPARK-14945 Project: Spark Issue Type: Sub-task Components: PySpark, SQL Affects Versions: 2.0.0 Reporter: Andrew Or -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14945) Python SparkSession API
[ https://issues.apache.org/jira/browse/SPARK-14945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-14945: - Assignee: Andrew Or > Python SparkSession API > --- > > Key: SPARK-14945 > URL: https://issues.apache.org/jira/browse/SPARK-14945 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14915) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job to never complete
[ https://issues.apache.org/jira/browse/SPARK-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260623#comment-15260623 ] Andrew Or commented on SPARK-14915: --- I haven't looked into the scheduler code in detail yet, but it seems to me the bug is not caused by your fix to use the `CausedBy`. Rather, the bug has always existed, but your fix just uncovered it. It does seem like a problem in the scheduler; under no circumstances should we retry a stage without limits. > Tasks that fail due to CommitDeniedException (a side-effect of speculation) > can cause job to never complete > --- > > Key: SPARK-14915 > URL: https://issues.apache.org/jira/browse/SPARK-14915 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.2 >Reporter: Jason Moore >Priority: Critical > > In SPARK-14357, code was corrected towards the originally intended behavior > that a CommitDeniedException should not count towards the failure count for a > job. After having run with this fix for a few weeks, it's become apparent > that this behavior has some unintended consequences - that a speculative task > will continuously receive a CDE from the driver, now causing it to fail and > retry over and over without limit. > I'm thinking we could put a task that receives a CDE from the driver, into a > TaskState.FINISHED or some other state to indicated that the task shouldn't > be resubmitted by the TaskScheduler. I'd probably need some opinions on > whether there are other consequences for doing something like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14014) Replace existing analysis.Catalog with SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14014. --- Resolution: Fixed Fix Version/s: 2.0.0 > Replace existing analysis.Catalog with SessionCatalog > - > > Key: SPARK-14014 > URL: https://issues.apache.org/jira/browse/SPARK-14014 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2.0.0 > > > As of this moment, there exist many catalogs in Spark. For Spark 2.0, we will > have two high level catalogs only: SessionCatalog and ExternalCatalog. > SessionCatalog (implemented in SPARK-13923) keeps track of temporary > functions and tables and delegates other operations to ExternalCatalog. > At the same time, there's this legacy catalog called `analysis.Catalog` that > also tracks temporary functions and tables. The goal is to get rid of this > legacy catalog and replace it with SessionCatalog, which is the new thing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14014) Replace existing analysis.Catalog with SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261177#comment-15261177 ] Andrew Or commented on SPARK-14014: --- Pretty sure this was fixed. :) > Replace existing analysis.Catalog with SessionCatalog > - > > Key: SPARK-14014 > URL: https://issues.apache.org/jira/browse/SPARK-14014 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2.0.0 > > > As of this moment, there exist many catalogs in Spark. For Spark 2.0, we will > have two high level catalogs only: SessionCatalog and ExternalCatalog. > SessionCatalog (implemented in SPARK-13923) keeps track of temporary > functions and tables and delegates other operations to ExternalCatalog. > At the same time, there's this legacy catalog called `analysis.Catalog` that > also tracks temporary functions and tables. The goal is to get rid of this > legacy catalog and replace it with SessionCatalog, which is the new thing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14988) Implement catalog and conf API in Python SparkSession
[ https://issues.apache.org/jira/browse/SPARK-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14988: -- Description: This is like implementing SPARK-13477 and SPARK-13487 in the Python SparkSession. > Implement catalog and conf API in Python SparkSession > - > > Key: SPARK-14988 > URL: https://issues.apache.org/jira/browse/SPARK-14988 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > This is like implementing SPARK-13477 and SPARK-13487 in the Python > SparkSession. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14988) Implement catalog and conf API in Python SparkSession
Andrew Or created SPARK-14988: - Summary: Implement catalog and conf API in Python SparkSession Key: SPARK-14988 URL: https://issues.apache.org/jira/browse/SPARK-14988 Project: Spark Issue Type: Sub-task Components: PySpark, SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14988) Implement catalog and conf API in Python SparkSession
[ https://issues.apache.org/jira/browse/SPARK-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14988. --- Resolution: Fixed Fix Version/s: 2.0.0 > Implement catalog and conf API in Python SparkSession > - > > Key: SPARK-14988 > URL: https://issues.apache.org/jira/browse/SPARK-14988 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2.0.0 > > > This is like implementing SPARK-13477 and SPARK-13487 in the Python > SparkSession. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15012) Simplify configuration API further
Andrew Or created SPARK-15012: - Summary: Simplify configuration API further Key: SPARK-15012 URL: https://issues.apache.org/jira/browse/SPARK-15012 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or Proposals: (1) Remove all the setConf, getConf etc. Just expose `spark.conf` (2) Make `spark.conf` take in things set in the core `SparkConf` as well, otherwise users may get confused. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-14896) Remove HiveContext in Python
[ https://issues.apache.org/jira/browse/SPARK-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reopened SPARK-14896: --- > Remove HiveContext in Python > > > Key: SPARK-14896 > URL: https://issues.apache.org/jira/browse/SPARK-14896 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-14896) Remove HiveContext in Python
[ https://issues.apache.org/jira/browse/SPARK-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-14896. - Resolution: Won't Fix > Remove HiveContext in Python > > > Key: SPARK-14896 > URL: https://issues.apache.org/jira/browse/SPARK-14896 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14673) Remove HiveContext
[ https://issues.apache.org/jira/browse/SPARK-14673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14673. --- Resolution: Fixed Fix Version/s: 2.0.0 > Remove HiveContext > -- > > Key: SPARK-14673 > URL: https://issues.apache.org/jira/browse/SPARK-14673 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > Fix For: 2.0.0 > > > In Spark 2.0, we will have a new SparkSession that can run commands against > the Hive metastore. The metastore will be initialized lazily so as not to > slow down the initialization of spark-shell. This is the first step towards > that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14896) Deprecate HiveContext in Python
[ https://issues.apache.org/jira/browse/SPARK-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14896: -- Summary: Deprecate HiveContext in Python (was: Remove HiveContext in Python) > Deprecate HiveContext in Python > --- > > Key: SPARK-14896 > URL: https://issues.apache.org/jira/browse/SPARK-14896 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15019) Propagate all Spark Confs to HiveConf created in HiveClientImpl
[ https://issues.apache.org/jira/browse/SPARK-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15019. --- Resolution: Fixed Fix Version/s: 2.0.0 > Propagate all Spark Confs to HiveConf created in HiveClientImpl > --- > > Key: SPARK-15019 > URL: https://issues.apache.org/jira/browse/SPARK-15019 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Yin Huai > Fix For: 2.0.0 > > > Right now, the HiveConf created in HiveClientImpl only takes conf set at > runtime or set in hive-site.xml. We should also propagate Spark confs to it. > So, users do not have to use hive-site.xml to set warehouse location and > metastore url. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-14895) SparkSession Python API
[ https://issues.apache.org/jira/browse/SPARK-14895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-14895. - Resolution: Duplicate > SparkSession Python API > --- > > Key: SPARK-14895 > URL: https://issues.apache.org/jira/browse/SPARK-14895 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15068) Use proper metastore warehouse path
Andrew Or created SPARK-15068: - Summary: Use proper metastore warehouse path Key: SPARK-15068 URL: https://issues.apache.org/jira/browse/SPARK-15068 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or Today, we use "" empty string without Hive, and "/user/hive/warehouse", which is not a great default since it probably doesn't exist on the box. Instead, it would be better to use a subdir inside `user.dir` or something. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15068) Use proper metastore warehouse path
[ https://issues.apache.org/jira/browse/SPARK-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15068: -- Description: Today, we use "" empty string without Hive, and "/user/hive/warehouse" with Hive, which is not a great default since it probably doesn't exist on the box. Instead, it would be better to use a subdir inside `user.dir` or something. (was: Today, we use "" empty string without Hive, and "/user/hive/warehouse", which is not a great default since it probably doesn't exist on the box. Instead, it would be better to use a subdir inside `user.dir` or something.) > Use proper metastore warehouse path > --- > > Key: SPARK-15068 > URL: https://issues.apache.org/jira/browse/SPARK-15068 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > Today, we use "" empty string without Hive, and "/user/hive/warehouse" with > Hive, which is not a great default since it probably doesn't exist on the > box. Instead, it would be better to use a subdir inside `user.dir` or > something. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15068) Use proper metastore warehouse path
[ https://issues.apache.org/jira/browse/SPARK-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15068. --- Resolution: Duplicate > Use proper metastore warehouse path > --- > > Key: SPARK-15068 > URL: https://issues.apache.org/jira/browse/SPARK-15068 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > Today, we use "" empty string without Hive, and "/user/hive/warehouse" with > Hive, which is not a great default since it probably doesn't exist on the > box. Instead, it would be better to use a subdir inside `user.dir` or > something. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites
[ https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15037: -- Summary: Use SparkSession instead of SQLContext in testsuites (was: Use SparkSession instread of SQLContext in testsuites) > Use SparkSession instead of SQLContext in testsuites > > > Key: SPARK-15037 > URL: https://issues.apache.org/jira/browse/SPARK-15037 > Project: Spark > Issue Type: Sub-task >Reporter: Dongjoon Hyun > > This issue aims to update the existing testsuites to use `SparkSession` > instread of `SQLContext` since `SQLContext` exists just for backward > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15073) Make SparkSession constructors private
[ https://issues.apache.org/jira/browse/SPARK-15073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-15073: - Assignee: Andrew Or > Make SparkSession constructors private > -- > > Key: SPARK-15073 > URL: https://issues.apache.org/jira/browse/SPARK-15073 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Andrew Or > Fix For: 2.0.0 > > > So users have to use the Builder pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15084) Use builder pattern to create SparkSession in PySpark
[ https://issues.apache.org/jira/browse/SPARK-15084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15084: -- Assignee: Dongjoon Hyun > Use builder pattern to create SparkSession in PySpark > - > > Key: SPARK-15084 > URL: https://issues.apache.org/jira/browse/SPARK-15084 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 2.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun > > This is a Python port of SPARK-15052. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15072) Remove SparkSession.withHiveSupport
[ https://issues.apache.org/jira/browse/SPARK-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15072: -- Assignee: Sandeep Singh > Remove SparkSession.withHiveSupport > --- > > Key: SPARK-15072 > URL: https://issues.apache.org/jira/browse/SPARK-15072 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Sandeep Singh > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14422) Improve handling of optional configs in SQLConf
[ https://issues.apache.org/jira/browse/SPARK-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14422. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Improve handling of optional configs in SQLConf > --- > > Key: SPARK-14422 > URL: https://issues.apache.org/jira/browse/SPARK-14422 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin >Priority: Minor > Fix For: 2.0.0 > > > As Michael showed here: > https://github.com/apache/spark/pull/12119/files/69aa1a005cc7003ab62d6dfcdef42181b053eaed#r58634150 > Handling of optional configs in SQLConf is a little sub-optimal right now. We > should clean that up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14422) Improve handling of optional configs in SQLConf
[ https://issues.apache.org/jira/browse/SPARK-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14422: -- Assignee: Sandeep Singh > Improve handling of optional configs in SQLConf > --- > > Key: SPARK-14422 > URL: https://issues.apache.org/jira/browse/SPARK-14422 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin >Assignee: Sandeep Singh >Priority: Minor > Fix For: 2.0.0 > > > As Michael showed here: > https://github.com/apache/spark/pull/12119/files/69aa1a005cc7003ab62d6dfcdef42181b053eaed#r58634150 > Handling of optional configs in SQLConf is a little sub-optimal right now. We > should clean that up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14645) non local Python resource doesn't work with Mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-14645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14645. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > non local Python resource doesn't work with Mesos cluster mode > -- > > Key: SPARK-14645 > URL: https://issues.apache.org/jira/browse/SPARK-14645 > Project: Spark > Issue Type: Bug >Reporter: Timothy Chen > Fix For: 2.0.0 > > > Currently SparkSubmit explicitly allows non-local python resources for > cluster mode with Mesos, which it's actually supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15084) Use builder pattern to create SparkSession in PySpark
[ https://issues.apache.org/jira/browse/SPARK-15084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15084. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Use builder pattern to create SparkSession in PySpark > - > > Key: SPARK-15084 > URL: https://issues.apache.org/jira/browse/SPARK-15084 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 2.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun > Fix For: 2.0.0 > > > This is a Python port of SPARK-15052. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15097) Import fails for someDataset.sqlContext.implicits._
[ https://issues.apache.org/jira/browse/SPARK-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15097. --- Resolution: Fixed Assignee: Koert Kuipers Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Import fails for someDataset.sqlContext.implicits._ > --- > > Key: SPARK-15097 > URL: https://issues.apache.org/jira/browse/SPARK-15097 > Project: Spark > Issue Type: Bug > Components: SQL > Environment: spark-2.0.0-SNAPSHOT >Reporter: koert kuipers >Assignee: Koert Kuipers > Fix For: 2.0.0 > > > with the introduction of SparkSession SQLContext changed from being a lazy > val to a def inside Dataset. however this is troublesome if you want to do: > import someDataset.sqlContext.implicits._ > you get this error: > stable identifier required, but someDataset.sqlContext.implicits found. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14414) Make error messages consistent across DDLs
[ https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14414. --- Resolution: Fixed Fix Version/s: 2.0.0 > Make error messages consistent across DDLs > -- > > Key: SPARK-14414 > URL: https://issues.apache.org/jira/browse/SPARK-14414 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2.0.0 > > > There are many different error messages right now when the user tries to run > something that's not supported. We might throw AnalysisException or > ParseException or NoSuchFunctionException etc. We should make all of these > consistent before 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14645) non local Python resource doesn't work with Mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-14645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14645: -- Assignee: Timothy Chen > non local Python resource doesn't work with Mesos cluster mode > -- > > Key: SPARK-14645 > URL: https://issues.apache.org/jira/browse/SPARK-14645 > Project: Spark > Issue Type: Bug >Reporter: Timothy Chen >Assignee: Timothy Chen > Fix For: 2.0.0 > > > Currently SparkSubmit explicitly allows non-local python resources for > cluster mode with Mesos, which it's actually supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13269) Expose more executor stats in stable status API
[ https://issues.apache.org/jira/browse/SPARK-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13269. --- Resolution: Fixed Assignee: Wenchen Fan Fix Version/s: 2.0.0 > Expose more executor stats in stable status API > --- > > Key: SPARK-13269 > URL: https://issues.apache.org/jira/browse/SPARK-13269 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Andrew Or >Assignee: Wenchen Fan > Fix For: 2.0.0 > > > Currently the stable status API is quite limited; it exposes only a small > subset of the things exposed by JobProgressListener. It is useful for very > high level querying but falls short when the developer wants to build an > application on top of Spark with more integration. > In this issue I propose that we expose at least two things: > - Which executors are running tasks, and > - Which executors cached how much in memory and on disk > The goal is not to expose exactly these two things, but to expose something > that would allow the developer to learn about them. These concepts are very > much fundamental in Spark's design so there's almost no chance that they will > go away in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13269) Expose more executor stats in stable status API
[ https://issues.apache.org/jira/browse/SPARK-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270067#comment-15270067 ] Andrew Or commented on SPARK-13269: --- Oops actually this was already done in SPARK-14069. Closing this as duplicate. > Expose more executor stats in stable status API > --- > > Key: SPARK-13269 > URL: https://issues.apache.org/jira/browse/SPARK-13269 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Andrew Or > Fix For: 2.0.0 > > > Currently the stable status API is quite limited; it exposes only a small > subset of the things exposed by JobProgressListener. It is useful for very > high level querying but falls short when the developer wants to build an > application on top of Spark with more integration. > In this issue I propose that we expose at least two things: > - Which executors are running tasks, and > - Which executors cached how much in memory and on disk > The goal is not to expose exactly these two things, but to expose something > that would allow the developer to learn about them. These concepts are very > much fundamental in Spark's design so there's almost no chance that they will > go away in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12299) Remove history serving functionality from standalone Master
[ https://issues.apache.org/jira/browse/SPARK-12299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12299. --- Resolution: Fixed Assignee: Bryan Cutler Fix Version/s: 2.0.0 > Remove history serving functionality from standalone Master > --- > > Key: SPARK-12299 > URL: https://issues.apache.org/jira/browse/SPARK-12299 > Project: Spark > Issue Type: Sub-task > Components: Deploy >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Bryan Cutler > Fix For: 2.0.0 > > > The standalone Master currently continues to serve the historical UIs of > applications that have completed and enabled event logging. This poses > problems, however, if the event log is very large, e.g. SPARK-6270. The > Master might OOM or hang while it rebuilds the UI, rejecting applications in > the mean time. > Personally, I have had to make modifications in the code to disable this > myself, because I wanted to use event logging in standalone mode for > applications that produce a lot of logging. > Removing this from the Master would simplify the process significantly. This > issue supersedes SPARK-12062. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15126) RuntimeConfig.set should return Unit rather than RuntimeConfig itself
[ https://issues.apache.org/jira/browse/SPARK-15126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15126. --- Resolution: Fixed > RuntimeConfig.set should return Unit rather than RuntimeConfig itself > - > > Key: SPARK-15126 > URL: https://issues.apache.org/jira/browse/SPARK-15126 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > > Currently we return RuntimeConfig itself to facilitate chaining. However, it > makes the output in interactive environments (e.g. notebooks, scala repl) > weird. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15121) Improve logging of external shuffle handler
[ https://issues.apache.org/jira/browse/SPARK-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15121. --- Resolution: Fixed Assignee: Thomas Graves Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Improve logging of external shuffle handler > --- > > Key: SPARK-15121 > URL: https://issues.apache.org/jira/browse/SPARK-15121 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 1.6.1 >Reporter: Thomas Graves >Assignee: Thomas Graves > Fix For: 2.0.0 > > > I want to get more information about who is connecting to the spark external > shuffle handler. So I want to enhance the OpenBlocks call in > ExternalShuffleBlockHandler -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15121) Improve logging of external shuffle handler
[ https://issues.apache.org/jira/browse/SPARK-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15121: -- Priority: Minor (was: Major) > Improve logging of external shuffle handler > --- > > Key: SPARK-15121 > URL: https://issues.apache.org/jira/browse/SPARK-15121 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 1.6.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Minor > Fix For: 2.0.0 > > > I want to get more information about who is connecting to the spark external > shuffle handler. So I want to enhance the OpenBlocks call in > ExternalShuffleBlockHandler -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13001) Coarse-grained Mesos scheduler should reject offers for longer period of time when reached max cores
[ https://issues.apache.org/jira/browse/SPARK-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13001. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Coarse-grained Mesos scheduler should reject offers for longer period of time > when reached max cores > > > Key: SPARK-13001 > URL: https://issues.apache.org/jira/browse/SPARK-13001 > Project: Spark > Issue Type: Improvement > Components: Mesos, Scheduler >Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Sebastien Rainville > Fix For: 2.0.0 > > > Similar to https://issues.apache.org/jira/browse/SPARK-10471 but for "reached > max cores". > The coarse-grained Mesos scheduler accepts every offer that match the > requirements until it reaches "spark.cores.max", at which point it will > reject every offer. Even though spark will never launch tasks on these > offers, mesos keeps sending the same offers again every 5 seconds making them > unavailable for other frameworks. > Spark should reject those offers for a longer period of time to prevent offer > starvation when running a lot of frameworks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15031) Use SparkSession in Scala/Python/Java example.
[ https://issues.apache.org/jira/browse/SPARK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15031. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Use SparkSession in Scala/Python/Java example. > -- > > Key: SPARK-15031 > URL: https://issues.apache.org/jira/browse/SPARK-15031 > Project: Spark > Issue Type: Sub-task > Components: Examples >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun > Fix For: 2.0.0 > > > This PR aims to update Scala/Python/Java examples by replacing `SQLContext` > with newly added `SparkSession`. > - Use *SparkSession Builder Pattern* in 154(Scala 55, Java 52, Python 47) > files. > - Add `getConf` in Python `SparkContext` class: python/pyspark/context.py > - Replace *SQLContext Singleton Pattern* with *SparkSession Singleton > Pattern*: > - `SqlNetworkWordCount.scala` > - `JavaSqlNetworkWordCount.java` > - `sql_network_wordcount.py` > Now, `SQLContexts` are used only in R examples and the following two Python > examples. The python examples are untouched in this PR since it already fails > some unknown issue. > - `simple_params_example.py` > - `aft_survival_regression.py` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13001) Coarse-grained Mesos scheduler should reject offers for longer period of time when reached max cores
[ https://issues.apache.org/jira/browse/SPARK-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-13001: -- Assignee: Sebastien Rainville > Coarse-grained Mesos scheduler should reject offers for longer period of time > when reached max cores > > > Key: SPARK-13001 > URL: https://issues.apache.org/jira/browse/SPARK-13001 > Project: Spark > Issue Type: Improvement > Components: Mesos, Scheduler >Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Sebastien Rainville >Assignee: Sebastien Rainville > Fix For: 2.0.0 > > > Similar to https://issues.apache.org/jira/browse/SPARK-10471 but for "reached > max cores". > The coarse-grained Mesos scheduler accepts every offer that match the > requirements until it reaches "spark.cores.max", at which point it will > reject every offer. Even though spark will never launch tasks on these > offers, mesos keeps sending the same offers again every 5 seconds making them > unavailable for other frameworks. > Spark should reject those offers for a longer period of time to prevent offer > starvation when running a lot of frameworks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15116) In REPL we should create SparkSession first and get SparkContext from it
[ https://issues.apache.org/jira/browse/SPARK-15116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15116. --- Resolution: Fixed Fix Version/s: 2.0.0 > In REPL we should create SparkSession first and get SparkContext from it > > > Key: SPARK-15116 > URL: https://issues.apache.org/jira/browse/SPARK-15116 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15045) Remove dead code in TaskMemoryManager.cleanUpAllAllocatedMemory for pageTable
[ https://issues.apache.org/jira/browse/SPARK-15045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15045: -- Priority: Major (was: Trivial) > Remove dead code in TaskMemoryManager.cleanUpAllAllocatedMemory for pageTable > - > > Key: SPARK-15045 > URL: https://issues.apache.org/jira/browse/SPARK-15045 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Jacek Laskowski > > Unless my eyes trick me, {{TaskMemoryManager}} first clears up {{pageTable}} > in a synchronized block and right after the block it does it again. I think > the outside cleaning is a dead code. > See > https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java#L382-L397 > with the relevant snippet pasted below: > {code} > public long cleanUpAllAllocatedMemory() { > synchronized (this) { > Arrays.fill(pageTable, null); > ... > } > for (MemoryBlock page : pageTable) { > if (page != null) { > memoryManager.tungstenMemoryAllocator().free(page); > } > } > Arrays.fill(pageTable, null); >... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites
[ https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15037: -- Assignee: Sandeep Singh > Use SparkSession instead of SQLContext in testsuites > > > Key: SPARK-15037 > URL: https://issues.apache.org/jira/browse/SPARK-15037 > Project: Spark > Issue Type: Sub-task >Reporter: Dongjoon Hyun >Assignee: Sandeep Singh > > This issue aims to update the existing testsuites to use `SparkSession` > instread of `SQLContext` since `SQLContext` exists just for backward > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14896) Deprecate HiveContext in Python
[ https://issues.apache.org/jira/browse/SPARK-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14896. --- Resolution: Fixed Fix Version/s: 2.0.0 > Deprecate HiveContext in Python > --- > > Key: SPARK-14896 > URL: https://issues.apache.org/jira/browse/SPARK-14896 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Critical > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15072) Remove SparkSession.withHiveSupport
[ https://issues.apache.org/jira/browse/SPARK-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15072. --- Resolution: Fixed > Remove SparkSession.withHiveSupport > --- > > Key: SPARK-15072 > URL: https://issues.apache.org/jira/browse/SPARK-15072 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Sandeep Singh > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15135) Make sure SparkSession thread safe
[ https://issues.apache.org/jira/browse/SPARK-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15135. --- Resolution: Fixed Fix Version/s: 2.0.0 > Make sure SparkSession thread safe > -- > > Key: SPARK-15135 > URL: https://issues.apache.org/jira/browse/SPARK-15135 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 2.0.0 > > > Fixed non-thread-safe classed used by SparkSession. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15134) Indent SparkSession builder patterns and update binary_classification_metrics_example.py
[ https://issues.apache.org/jira/browse/SPARK-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15134. --- Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Indent SparkSession builder patterns and update > binary_classification_metrics_example.py > > > Key: SPARK-15134 > URL: https://issues.apache.org/jira/browse/SPARK-15134 > Project: Spark > Issue Type: Task > Components: Examples >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.0.0 > > > This issue addresses the comments in SPARK-15031 and also fix java-linter > errors. > - Use multiline format in SparkSession builder patterns. > - Update `binary_classification_metrics_example.py` to use `SparkSession`. > - Fix Java Linter errors (in SPARK-13745, SPARK-15031, and so far) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15158) Too aggressive logging in SizeBasedRollingPolicy?
[ https://issues.apache.org/jira/browse/SPARK-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15158. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Too aggressive logging in SizeBasedRollingPolicy? > - > > Key: SPARK-15158 > URL: https://issues.apache.org/jira/browse/SPARK-15158 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Kai Wang >Priority: Trivial > Fix For: 2.0.0 > > > The questionable line is this: > https://github.com/apache/spark/blob/3e27940a19e7bab448f1af11d2065ecd1ec66197/core/src/main/scala/org/apache/spark/util/logging/RollingPolicy.scala#L116 > This will output a message *whenever* anything is logged at executor level. > Like the following: > SizeBasedRollingPolicy:59 83 + 140796 > 1048576 > SizeBasedRollingPolicy:59 83 + 140879 > 1048576 > SizeBasedRollingPolicy:59 83 + 140962 > 1048576 > ... > This seems to aggressive. Should this be at least downgrade to debug level? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9926) Parallelize file listing for partitioned Hive table
[ https://issues.apache.org/jira/browse/SPARK-9926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-9926. -- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Parallelize file listing for partitioned Hive table > --- > > Key: SPARK-9926 > URL: https://issues.apache.org/jira/browse/SPARK-9926 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.1, 1.5.0 >Reporter: Cheolsoo Park >Assignee: Ryan Blue > Fix For: 2.0.0 > > > In Spark SQL, short queries like {{select * from table limit 10}} run very > slowly against partitioned Hive tables because of file listing. In > particular, if a large number of partitions are scanned on storage like S3, > the queries run extremely slowly. Here are some example benchmarks in my > environment- > * Parquet-backed Hive table > * Partitioned by dateint and hour > * Stored on S3 > ||\# of partitions||\# of files||runtime||query|| > |1|972|30 secs|select * from nccp_log where dateint=20150601 and hour=0 limit > 10;| > |24|13646|6 mins|select * from nccp_log where dateint=20150601 limit 10;| > |240|136222|1 hour|select * from nccp_log where dateint>=20150601 and > dateint<=20150610 limit 10;| > The problem is that {{TableReader}} constructs a separate HadoopRDD per Hive > partition path and group them into a UnionRDD. Then, all the input files are > listed sequentially. In other tools such as Hive and Pig, this can be solved > by setting > [mapreduce.input.fileinputformat.list-status.num-threads|https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml] > high. But in Spark, since each HadoopRDD lists only one partition path, > setting this property doesn't help. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15158) Too aggressive logging in SizeBasedRollingPolicy?
[ https://issues.apache.org/jira/browse/SPARK-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15158: -- Assignee: Kai Wang > Too aggressive logging in SizeBasedRollingPolicy? > - > > Key: SPARK-15158 > URL: https://issues.apache.org/jira/browse/SPARK-15158 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Kai Wang >Assignee: Kai Wang >Priority: Trivial > Fix For: 2.0.0 > > > The questionable line is this: > https://github.com/apache/spark/blob/3e27940a19e7bab448f1af11d2065ecd1ec66197/core/src/main/scala/org/apache/spark/util/logging/RollingPolicy.scala#L116 > This will output a message *whenever* anything is logged at executor level. > Like the following: > SizeBasedRollingPolicy:59 83 + 140796 > 1048576 > SizeBasedRollingPolicy:59 83 + 140879 > 1048576 > SizeBasedRollingPolicy:59 83 + 140962 > 1048576 > ... > This seems to aggressive. Should this be at least downgrade to debug level? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14893) Re-enable HiveSparkSubmitSuite SPARK-8489 test after HiveContext is removed
[ https://issues.apache.org/jira/browse/SPARK-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14893. --- Resolution: Fixed Assignee: Dilip Biswal Fix Version/s: 2.0.0 > Re-enable HiveSparkSubmitSuite SPARK-8489 test after HiveContext is removed > --- > > Key: SPARK-14893 > URL: https://issues.apache.org/jira/browse/SPARK-14893 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Dilip Biswal > Fix For: 2.0.0 > > > The test was disabled in https://github.com/apache/spark/pull/12585. To > re-enable it we need to rebuild the jar using the updated source code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15166) Move hive-specific conf setting to HiveSharedState
Andrew Or created SPARK-15166: - Summary: Move hive-specific conf setting to HiveSharedState Key: SPARK-15166 URL: https://issues.apache.org/jira/browse/SPARK-15166 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15166) Move hive-specific conf setting from SparkSession
[ https://issues.apache.org/jira/browse/SPARK-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15166: -- Summary: Move hive-specific conf setting from SparkSession (was: Move hive-specific conf setting to HiveSharedState) > Move hive-specific conf setting from SparkSession > - > > Key: SPARK-15166 > URL: https://issues.apache.org/jira/browse/SPARK-15166 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15167) Add public catalog implementation method to SparkSession
Andrew Or created SPARK-15167: - Summary: Add public catalog implementation method to SparkSession Key: SPARK-15167 URL: https://issues.apache.org/jira/browse/SPARK-15167 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or Right now there's no way to check whether a given SparkSession has Hive support. You can do `spark.conf.get("spark.sql.catalogImplementation")` but that's supposed to be hidden from the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15152) Scaladoc and Code style Improvements
[ https://issues.apache.org/jira/browse/SPARK-15152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15152. --- Resolution: Fixed Assignee: Jacek Laskowski Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Scaladoc and Code style Improvements > > > Key: SPARK-15152 > URL: https://issues.apache.org/jira/browse/SPARK-15152 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML, Spark Core, SQL, YARN >Affects Versions: 2.0.0 >Reporter: Jacek Laskowski >Assignee: Jacek Laskowski >Priority: Minor > Fix For: 2.0.0 > > > While doing code reviews for the Spark Notes I found many places with typos > and incorrect code style. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13566) Deadlock between MemoryStore and BlockManager
[ https://issues.apache.org/jira/browse/SPARK-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-13566: -- Assignee: cen yuhai > Deadlock between MemoryStore and BlockManager > - > > Key: SPARK-13566 > URL: https://issues.apache.org/jira/browse/SPARK-13566 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 1.6.0 > Environment: Spark 1.6.0 hadoop2.2.0 jdk1.8.0_65 centOs 6.2 >Reporter: cen yuhai >Assignee: cen yuhai > > === > "block-manager-slave-async-thread-pool-1": > at org.apache.spark.storage.MemoryStore.remove(MemoryStore.scala:216) > - waiting to lock <0x0005895b09b0> (a > org.apache.spark.memory.UnifiedMemoryManager) > at > org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1114) > - locked <0x00058ed6aae0> (a org.apache.spark.storage.BlockInfo) > at > org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101) > at > org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101) > at scala.collection.immutable.Set$Set2.foreach(Set.scala:94) > at > org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1101) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:84) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > "Executor task launch worker-10": > at > org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1032) > - waiting to lock <0x00059a0988b8> (a > org.apache.spark.storage.BlockInfo) > at > org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1009) > at > org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:460) > at > org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:449) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15167) Add public catalog implementation method to SparkSession
[ https://issues.apache.org/jira/browse/SPARK-15167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15167. --- Resolution: Won't Fix > Add public catalog implementation method to SparkSession > > > Key: SPARK-15167 > URL: https://issues.apache.org/jira/browse/SPARK-15167 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > Right now there's no way to check whether a given SparkSession has Hive > support. You can do `spark.conf.get("spark.sql.catalogImplementation")` but > that's supposed to be hidden from the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13566) Deadlock between MemoryStore and BlockManager
[ https://issues.apache.org/jira/browse/SPARK-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13566. --- Resolution: Fixed Fix Version/s: 1.6.2 > Deadlock between MemoryStore and BlockManager > - > > Key: SPARK-13566 > URL: https://issues.apache.org/jira/browse/SPARK-13566 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 1.6.0 > Environment: Spark 1.6.0 hadoop2.2.0 jdk1.8.0_65 centOs 6.2 >Reporter: cen yuhai >Assignee: cen yuhai > Fix For: 1.6.2 > > > === > "block-manager-slave-async-thread-pool-1": > at org.apache.spark.storage.MemoryStore.remove(MemoryStore.scala:216) > - waiting to lock <0x0005895b09b0> (a > org.apache.spark.memory.UnifiedMemoryManager) > at > org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1114) > - locked <0x00058ed6aae0> (a org.apache.spark.storage.BlockInfo) > at > org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101) > at > org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101) > at scala.collection.immutable.Set$Set2.foreach(Set.scala:94) > at > org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1101) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:84) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > "Executor task launch worker-10": > at > org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1032) > - waiting to lock <0x00059a0988b8> (a > org.apache.spark.storage.BlockInfo) > at > org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1009) > at > org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:460) > at > org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:449) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15093) create/delete/rename directory for InMemoryCatalog operations if needed
[ https://issues.apache.org/jira/browse/SPARK-15093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15093. --- Resolution: Fixed Fix Version/s: 2.0.0 > create/delete/rename directory for InMemoryCatalog operations if needed > --- > > Key: SPARK-15093 > URL: https://issues.apache.org/jira/browse/SPARK-15093 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes
[ https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15223. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > spark.executor.logs.rolling.maxSize wrongly referred to as > spark.executor.logs.rolling.size.maxBytes > > > Key: SPARK-15223 > URL: https://issues.apache.org/jira/browse/SPARK-15223 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 1.6.1 >Reporter: Philipp Hoffmann >Priority: Trivial > Fix For: 2.0.0 > > > The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was > changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is > however still a reference in the documentation using the old name. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes
[ https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15223: -- Priority: Minor (was: Trivial) > spark.executor.logs.rolling.maxSize wrongly referred to as > spark.executor.logs.rolling.size.maxBytes > > > Key: SPARK-15223 > URL: https://issues.apache.org/jira/browse/SPARK-15223 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 1.6.1 >Reporter: Philipp Hoffmann >Priority: Minor > Fix For: 2.0.0 > > > The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was > changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is > however still a reference in the documentation using the old name. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes
[ https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15223: -- Fix Version/s: 1.6.2 > spark.executor.logs.rolling.maxSize wrongly referred to as > spark.executor.logs.rolling.size.maxBytes > > > Key: SPARK-15223 > URL: https://issues.apache.org/jira/browse/SPARK-15223 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 1.6.1 >Reporter: Philipp Hoffmann >Assignee: Philipp Hoffmann >Priority: Minor > Fix For: 1.6.2, 2.0.0 > > > The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was > changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is > however still a reference in the documentation using the old name. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes
[ https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15223: -- Target Version/s: 1.6.2, 2.0.0 (was: 2.0.0) > spark.executor.logs.rolling.maxSize wrongly referred to as > spark.executor.logs.rolling.size.maxBytes > > > Key: SPARK-15223 > URL: https://issues.apache.org/jira/browse/SPARK-15223 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 1.6.1 >Reporter: Philipp Hoffmann >Assignee: Philipp Hoffmann >Priority: Minor > Fix For: 1.6.2, 2.0.0 > > > The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was > changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is > however still a reference in the documentation using the old name. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes
[ https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15223: -- Assignee: Philipp Hoffmann > spark.executor.logs.rolling.maxSize wrongly referred to as > spark.executor.logs.rolling.size.maxBytes > > > Key: SPARK-15223 > URL: https://issues.apache.org/jira/browse/SPARK-15223 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 1.6.1 >Reporter: Philipp Hoffmann >Assignee: Philipp Hoffmann >Priority: Minor > Fix For: 1.6.2, 2.0.0 > > > The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was > changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is > however still a reference in the documentation using the old name. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15225) Replace SQLContext with SparkSession in Encoder documentation
[ https://issues.apache.org/jira/browse/SPARK-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15225. --- Resolution: Fixed Assignee: Liang-Chi Hsieh Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Replace SQLContext with SparkSession in Encoder documentation > - > > Key: SPARK-15225 > URL: https://issues.apache.org/jira/browse/SPARK-15225 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Liang-Chi Hsieh >Assignee: Liang-Chi Hsieh >Priority: Minor > Fix For: 2.0.0 > > > Encoder's doc mentions sqlContext.implicits._. We should use > sparkSession.implicits._ instead now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15067) YARN executors are launched with fixed perm gen size
[ https://issues.apache.org/jira/browse/SPARK-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15067. --- Resolution: Fixed Assignee: Sean Owen Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > YARN executors are launched with fixed perm gen size > > > Key: SPARK-15067 > URL: https://issues.apache.org/jira/browse/SPARK-15067 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0, 1.6.1 >Reporter: Renato Falchi Brandão >Assignee: Sean Owen >Priority: Minor > Fix For: 2.0.0 > > > It is impossible to change the executors max perm gen size using the property > "spark.executor.extraJavaOptions" when you are running on YARN. > When the JVM option "-XX:MaxPermSize" is set through the property > "spark.executor.extraJavaOptions", Spark put it properly in the shell command > that will start the JVM container but, in the ending of command, it sets > again this option using a fixed value of 256m, as you can see in the log I've > extracted: > 2016-04-30 17:20:12 INFO ExecutorRunnable:58 - > === > YARN executor launch context: > env: > CLASSPATH -> > {{PWD}}{{PWD}}/__spark__.jar$HADOOP_CONF_DIR/usr/hdp/current/hadoop-client/*/usr/hdp/current/hadoop-client/lib/*/usr/hdp/current/hadoop-hdfs-client/*/usr/hdp/current/hadoop-hdfs-client/lib/*/usr/hdp/current/hadoop-yarn-client/*/usr/hdp/current/hadoop-yarn-client/lib/*/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/*:/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/current/hadoop/lib/hadoop-lzo-0.6.0.jar:/etc/hadoop/conf/secure > SPARK_LOG_URL_STDERR -> > http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stderr?start=-4096 > SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1456962126505_329993 > SPARK_YARN_CACHE_FILES_FILE_SIZES -> 191719054,166 > SPARK_USER -> h_loadbd > SPARK_YARN_CACHE_FILES_VISIBILITIES -> PUBLIC,PUBLIC > SPARK_YARN_MODE -> true > SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1459806496093,1459808508343 > SPARK_LOG_URL_STDOUT -> > http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stdout?start=-4096 > SPARK_YARN_CACHE_FILES -> > hdfs://x/user/datalab/hdp/spark/lib/spark-assembly-1.6.0.2.3.4.1-10-hadoop2.7.1.2.3.4.1-10.jar#__spark__.jar,hdfs://tlvcluster/user/datalab/hdp/spark/conf/hive-site.xml#hive-site.xml > command: > {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms6144m > -Xmx6144m '-XX:+PrintGCDetails' '-XX:MaxPermSize=1024M' > '-XX:+PrintGCTimeStamps' -Djava.io.tmpdir={{PWD}}/tmp > '-Dspark.akka.timeout=30' '-Dspark.driver.port=62875' > '-Dspark.rpc.askTimeout=30' '-Dspark.rpc.lookupTimeout=30' > -Dspark.yarn.app.container.log.dir= -XX:MaxPermSize=256m > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@10.125.81.42:62875 --executor-id 1 --hostname > x0668sl.x.br --cores 1 --app-id application_1456962126505_329993 > --user-class-path file:$PWD/__app__.jar 1> /stdout 2> > /stderr > Analyzing the code is possible to see that all the options set in the > property "spark.executor.extraJavaOptions" are enclosed, one by one, in > single quotes (ExecutorRunnable.scala:151) before the launcher take the > decision if a default value has to be provided or not for the option > "-XX:MaxPermSize" (ExecutorRunnable.scala:202). > This decision is taken examining all the options set and looking for a string > starting with the value "-XX:MaxPermSize" (CommandBuilderUtils.java:328). If > that value is not found, the default value is set. > A string option starting without single quote will never be found, then, a > default value will always be provided. > A possible solution is change the source code of CommandBuilderUtils.java in > the line 328: > From-> if (arg.startsWith("-XX:MaxPermSize=")) > To-> if (arg.indexOf("-XX:MaxPermSize=") > -1) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15220) Add hyperlink to "running application" and "completed application"
[ https://issues.apache.org/jira/browse/SPARK-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15220. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Add hyperlink to "running application" and "completed application" > -- > > Key: SPARK-15220 > URL: https://issues.apache.org/jira/browse/SPARK-15220 > Project: Spark > Issue Type: Improvement > Components: Web UI >Reporter: Mao, Wei >Priority: Minor > Fix For: 2.0.0 > > > Add hyperlink to "running application" and "completed application", so user > can jump to application table directly, In my environment, I set up 1000+ > works and it's painful to scroll down to skip worker list. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15210) Add missing @DeveloperApi annotation in sql.types
[ https://issues.apache.org/jira/browse/SPARK-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15210: -- Assignee: zhengruifeng > Add missing @DeveloperApi annotation in sql.types > - > > Key: SPARK-15210 > URL: https://issues.apache.org/jira/browse/SPARK-15210 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Trivial > Fix For: 2.0.0 > > > @DeveloperApi annotation for {{AbstractDataType}} {{MapType}} > {{UserDefinedType}} are missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15166) Move hive-specific conf setting from SparkSession
[ https://issues.apache.org/jira/browse/SPARK-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15166. --- Resolution: Fixed Fix Version/s: 2.0.0 > Move hive-specific conf setting from SparkSession > - > > Key: SPARK-15166 > URL: https://issues.apache.org/jira/browse/SPARK-15166 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Minor > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15210) Add missing @DeveloperApi annotation in sql.types
[ https://issues.apache.org/jira/browse/SPARK-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15210. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Add missing @DeveloperApi annotation in sql.types > - > > Key: SPARK-15210 > URL: https://issues.apache.org/jira/browse/SPARK-15210 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Trivial > Fix For: 2.0.0 > > > @DeveloperApi annotation for {{AbstractDataType}} {{MapType}} > {{UserDefinedType}} are missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10653) Remove unnecessary things from SparkEnv
[ https://issues.apache.org/jira/browse/SPARK-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-10653. --- Resolution: Fixed Assignee: Alex Bozarth Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Remove unnecessary things from SparkEnv > --- > > Key: SPARK-10653 > URL: https://issues.apache.org/jira/browse/SPARK-10653 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Alex Bozarth > Fix For: 2.0.0 > > > As of the writing of this message, there are at least two things that can be > removed from it: > {code} > @DeveloperApi > class SparkEnv ( > val executorId: String, > private[spark] val rpcEnv: RpcEnv, > val serializer: Serializer, > val closureSerializer: Serializer, > val cacheManager: CacheManager, > val mapOutputTracker: MapOutputTracker, > val shuffleManager: ShuffleManager, > val broadcastManager: BroadcastManager, > val blockTransferService: BlockTransferService, // this one can go > val blockManager: BlockManager, > val securityManager: SecurityManager, > val httpFileServer: HttpFileServer, > val sparkFilesDir: String, // this one maybe? It's only used in 1 place. > val metricsSystem: MetricsSystem, > val shuffleMemoryManager: ShuffleMemoryManager, > val executorMemoryManager: ExecutorMemoryManager, // this can go > val outputCommitCoordinator: OutputCommitCoordinator, > val conf: SparkConf) extends Logging { > ... > } > {code} > We should avoid adding to this infinite list of things in SparkEnv's > constructors if they're not needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14021) Support custom context derived from HiveContext for SparkSQLEnv
[ https://issues.apache.org/jira/browse/SPARK-14021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276820#comment-15276820 ] Andrew Or commented on SPARK-14021: --- Closing as Won't Fix because the issue is outdated after HiveContext was removed. > Support custom context derived from HiveContext for SparkSQLEnv > --- > > Key: SPARK-14021 > URL: https://issues.apache.org/jira/browse/SPARK-14021 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Adrian Wang > > This is to create a custom context for command bin/spark-sql and > sbin/start-thriftserver. Any context that is derived from HiveContext is > acceptable. User need to configure the class name of custom context in a > config of spark.sql.context.class, and make sure the class in classpath. This > is to provide a more elegant way for custom configurations and changes for > infrastructure team. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14021) Support custom context derived from HiveContext for SparkSQLEnv
[ https://issues.apache.org/jira/browse/SPARK-14021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14021. --- Resolution: Won't Fix > Support custom context derived from HiveContext for SparkSQLEnv > --- > > Key: SPARK-14021 > URL: https://issues.apache.org/jira/browse/SPARK-14021 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Adrian Wang > > This is to create a custom context for command bin/spark-sql and > sbin/start-thriftserver. Any context that is derived from HiveContext is > acceptable. User need to configure the class name of custom context in a > config of spark.sql.context.class, and make sure the class in classpath. This > is to provide a more elegant way for custom configurations and changes for > infrastructure team. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15234) spark.catalog.listDatabases.show() is not formatted correctly
Andrew Or created SPARK-15234: - Summary: spark.catalog.listDatabases.show() is not formatted correctly Key: SPARK-15234 URL: https://issues.apache.org/jira/browse/SPARK-15234 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or {code} scala> spark.catalog.listDatabases.show() ++---+---+ |name|description|locationUri| ++---+---+ |Database[name='de...| |Database[name='my...| |Database[name='so...| ++---+---+ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15234) spark.catalog.listDatabases.show() is not formatted correctly
[ https://issues.apache.org/jira/browse/SPARK-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15234: -- Description: {code} scala> spark.catalog.listDatabases.show() ++---+---+ |name|description|locationUri| ++---+---+ |Database[name='de...| |Database[name='my...| |Database[name='so...| ++---+---+ {code} It's because org.apache.spark.sql.catalog.Database is not a case class! was: {code} scala> spark.catalog.listDatabases.show() ++---+---+ |name|description|locationUri| ++---+---+ |Database[name='de...| |Database[name='my...| |Database[name='so...| ++---+---+ {code} > spark.catalog.listDatabases.show() is not formatted correctly > - > > Key: SPARK-15234 > URL: https://issues.apache.org/jira/browse/SPARK-15234 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or > > {code} > scala> spark.catalog.listDatabases.show() > ++---+---+ > |name|description|locationUri| > ++---+---+ > |Database[name='de...| > |Database[name='my...| > |Database[name='so...| > ++---+---+ > {code} > It's because org.apache.spark.sql.catalog.Database is not a case class! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15234) spark.catalog.listDatabases.show() is not formatted correctly
[ https://issues.apache.org/jira/browse/SPARK-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-15234: - Assignee: Andrew Or > spark.catalog.listDatabases.show() is not formatted correctly > - > > Key: SPARK-15234 > URL: https://issues.apache.org/jira/browse/SPARK-15234 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > {code} > scala> spark.catalog.listDatabases.show() > ++---+---+ > |name|description|locationUri| > ++---+---+ > |Database[name='de...| > |Database[name='my...| > |Database[name='so...| > ++---+---+ > {code} > It's because org.apache.spark.sql.catalog.Database is not a case class! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15236) No way to disable Hive support in REPL
Andrew Or created SPARK-15236: - Summary: No way to disable Hive support in REPL Key: SPARK-15236 URL: https://issues.apache.org/jira/browse/SPARK-15236 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or If you built Spark with Hive classes, there's no switch to flip to start a new `spark-shell` using the InMemoryCatalog. The only thing you can do now is to rebuild Spark again. That is quite inconvenient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15236) No way to disable Hive support in REPL
[ https://issues.apache.org/jira/browse/SPARK-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15236: -- Assignee: (was: Andrew Or) > No way to disable Hive support in REPL > -- > > Key: SPARK-15236 > URL: https://issues.apache.org/jira/browse/SPARK-15236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or > > If you built Spark with Hive classes, there's no switch to flip to start a > new `spark-shell` using the InMemoryCatalog. The only thing you can do now is > to rebuild Spark again. That is quite inconvenient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15236) No way to disable Hive support in REPL
[ https://issues.apache.org/jira/browse/SPARK-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15236: -- Component/s: Spark Shell > No way to disable Hive support in REPL > -- > > Key: SPARK-15236 > URL: https://issues.apache.org/jira/browse/SPARK-15236 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or > > If you built Spark with Hive classes, there's no switch to flip to start a > new `spark-shell` using the InMemoryCatalog. The only thing you can do now is > to rebuild Spark again. That is quite inconvenient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites
[ https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15037. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Use SparkSession instead of SQLContext in testsuites > > > Key: SPARK-15037 > URL: https://issues.apache.org/jira/browse/SPARK-15037 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Reporter: Dongjoon Hyun >Assignee: Sandeep Singh > Fix For: 2.0.0 > > > This issue aims to update the existing testsuites to use `SparkSession` > instread of `SQLContext` since `SQLContext` exists just for backward > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites
[ https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15037: -- Component/s: Tests > Use SparkSession instead of SQLContext in testsuites > > > Key: SPARK-15037 > URL: https://issues.apache.org/jira/browse/SPARK-15037 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Reporter: Dongjoon Hyun >Assignee: Sandeep Singh > Fix For: 2.0.0 > > > This issue aims to update the existing testsuites to use `SparkSession` > instread of `SQLContext` since `SQLContext` exists just for backward > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites
[ https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15037: -- Component/s: SQL > Use SparkSession instead of SQLContext in testsuites > > > Key: SPARK-15037 > URL: https://issues.apache.org/jira/browse/SPARK-15037 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Reporter: Dongjoon Hyun >Assignee: Sandeep Singh > Fix For: 2.0.0 > > > This issue aims to update the existing testsuites to use `SparkSession` > instread of `SQLContext` since `SQLContext` exists just for backward > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14684) Verification of partition specs in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14684: -- Assignee: Xiao Li > Verification of partition specs in SessionCatalog > - > > Key: SPARK-14684 > URL: https://issues.apache.org/jira/browse/SPARK-14684 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > > When users inputting invalid partition spec, we might not be able to catch > and issue the error messages. Sometimes, it could cause a disaster result. > For example, previously, when we alter a table and drop a partition with > invalid spec, it could drop all the partitions due to a bug/defect in Hive > Metastore API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14603) SessionCatalog needs to check if a metadata operation is valid
[ https://issues.apache.org/jira/browse/SPARK-14603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14603. --- Resolution: Fixed Assignee: Xiao Li Fix Version/s: 2.0.0 > SessionCatalog needs to check if a metadata operation is valid > -- > > Key: SPARK-14603 > URL: https://issues.apache.org/jira/browse/SPARK-14603 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Xiao Li >Priority: Critical > Fix For: 2.0.0 > > > Since we cannot really trust if the underlying external catalog can throw > exceptions when there is an invalid metadata operation, let's do it in > SessionCatalog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14857) Table/Database Name Validation in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-14857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14857: -- Assignee: Xiao Li > Table/Database Name Validation in SessionCatalog > > > Key: SPARK-14857 > URL: https://issues.apache.org/jira/browse/SPARK-14857 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > > We need validate the database/table names before storing these information in > `ExternalCatalog`. > For example, if users use `backstick` to quote the table/database names > containing illegal characters, these names are allowed by Spark Parser, but > Hive metastore does not allow them. We need to catch them in SessionCatalog > and issue an appropriate error message. > ``` > CREATE TABLE `tab:1` ... > ``` > This PR enforces the name rules of Spark SQL for `table`/`database`/`view`: > `only can contain alphanumeric and underscore characters.` Different from > Hive, we allow the names with starting underscore characters. > The validation of function/column names will be done in a separate JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15257) Require CREATE EXTERNAL TABLE to specify LOCATION
Andrew Or created SPARK-15257: - Summary: Require CREATE EXTERNAL TABLE to specify LOCATION Key: SPARK-15257 URL: https://issues.apache.org/jira/browse/SPARK-15257 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or Right now when the user runs `CREATE EXTERNAL TABLE` without specifying `LOCATION`, the table will still be created in the warehouse directory, but its metadata won't be deleted even when the user drops the table! This is a problem. We should use require the user to also specify `LOCATION`. Note: This doesn't not apply to `CREATE EXTERNAL TABLE ... USING`, which is not yet supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15249) Use FunctionResource instead of (String, String) in CreateFunction and CatalogFunction for resource
[ https://issues.apache.org/jira/browse/SPARK-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15249. --- Resolution: Fixed Assignee: Sandeep Singh Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Use FunctionResource instead of (String, String) in CreateFunction and > CatalogFunction for resource > --- > > Key: SPARK-15249 > URL: https://issues.apache.org/jira/browse/SPARK-15249 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Minor > Fix For: 2.0.0 > > > Use FunctionResource instead of (String, String) in CreateFunction and > CatalogFunction for resource > see: TODO's here > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L36 > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala#L42 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15262) race condition in killing an executor and reregistering an executor
[ https://issues.apache.org/jira/browse/SPARK-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15262: -- Target Version/s: 1.6.2, 2.0.0 > race condition in killing an executor and reregistering an executor > --- > > Key: SPARK-15262 > URL: https://issues.apache.org/jira/browse/SPARK-15262 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Shixiong Zhu > > There is a race condition when killing an executor and reregistering an > executor happen at the same time. Here is the execution steps to reproduce it. > 1. master find a worker is dead > 2. master tells driver to remove executor > 3. driver remove executor > 4. BlockManagerMasterEndpoint remove the block manager > 5. executor finds it's not reigstered via heartbeat > 6. executor send reregister block manager > 7. register block manager > 8. executor is killed by worker > 9. CoarseGrainedSchedulerBackend ignores onDisconnected as this address is > not in the executor list > 10. BlockManagerMasterEndpoint.blockManagerInfo contains dead block managers > As BlockManagerMasterEndpoint.blockManagerInfo contains some dead block > managers, when we unpersist a RDD, remove a broadcast, or clean a shuffle > block via a RPC endpoint of a dead block manager, we will get > ClosedChannelException. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13566) Deadlock between MemoryStore and BlockManager
[ https://issues.apache.org/jira/browse/SPARK-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280615#comment-15280615 ] Andrew Or commented on SPARK-13566: --- [~ekeddy] This only happens with the unified memory manager, so you could switch back to the static memory manager by setting `spark.memory.useLegacyMode` to true. You may observe a decrease in performance if you do that, however. > Deadlock between MemoryStore and BlockManager > - > > Key: SPARK-13566 > URL: https://issues.apache.org/jira/browse/SPARK-13566 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 1.6.0 > Environment: Spark 1.6.0 hadoop2.2.0 jdk1.8.0_65 centOs 6.2 >Reporter: cen yuhai >Assignee: cen yuhai > Fix For: 1.6.2 > > > === > "block-manager-slave-async-thread-pool-1": > at org.apache.spark.storage.MemoryStore.remove(MemoryStore.scala:216) > - waiting to lock <0x0005895b09b0> (a > org.apache.spark.memory.UnifiedMemoryManager) > at > org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1114) > - locked <0x00058ed6aae0> (a org.apache.spark.storage.BlockInfo) > at > org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101) > at > org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101) > at scala.collection.immutable.Set$Set2.foreach(Set.scala:94) > at > org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1101) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:84) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > "Executor task launch worker-10": > at > org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1032) > - waiting to lock <0x00059a0988b8> (a > org.apache.spark.storage.BlockInfo) > at > org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1009) > at > org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:460) > at > org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:449) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org