[jira] [Updated] (SPARK-12317) Support configurate value with unit(e.g. kb/mb/gb) in SQL
[ https://issues.apache.org/jira/browse/SPARK-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yadong Qi updated SPARK-12317: -- Summary: Support configurate value with unit(e.g. kb/mb/gb) in SQL (was: Value should be configurated with unit(e.g. kb/mb/gb) in SQL) > Support configurate value with unit(e.g. kb/mb/gb) in SQL > - > > Key: SPARK-12317 > URL: https://issues.apache.org/jira/browse/SPARK-12317 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.2 >Reporter: Yadong Qi > > e.g. `spark.sql.autoBroadcastJoinThreshold` should be configurated as `10MB` > instead of `10485760`, because `10MB` is more easier than `10485760`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12317) Value should be configurated with unit(e.g. kb/mb/gb) in SQL
Yadong Qi created SPARK-12317: - Summary: Value should be configurated with unit(e.g. kb/mb/gb) in SQL Key: SPARK-12317 URL: https://issues.apache.org/jira/browse/SPARK-12317 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.5.2 Reporter: Yadong Qi e.g. `spark.sql.autoBroadcastJoinThreshold` should be configurated as `10MB` instead of `10485760`, because `10MB` is more easier than `10485760`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property
[ https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12311: Assignee: Apache Spark > [CORE] Restore previous value of "os.arch" property in test suites after > forcing to set specific value to "os.arch" property > > > Key: SPARK-12311 > URL: https://issues.apache.org/jira/browse/SPARK-12311 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Kazuaki Ishizaki >Assignee: Apache Spark >Priority: Minor > > Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the > specific value (e.g. "amd64") into system property "os.arch", they do not > restore the original value of "os.arch" after these test suites. This may > lead to failures in a test case that depends on architecture on other > platform rather than amd64. > They should save the original value of "os.arch" and restore this at the end > of these test suites. > > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property
[ https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12311: Assignee: (was: Apache Spark) > [CORE] Restore previous value of "os.arch" property in test suites after > forcing to set specific value to "os.arch" property > > > Key: SPARK-12311 > URL: https://issues.apache.org/jira/browse/SPARK-12311 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the > specific value (e.g. "amd64") into system property "os.arch", they do not > restore the original value of "os.arch" after these test suites. This may > lead to failures in a test case that depends on architecture on other > platform rather than amd64. > They should save the original value of "os.arch" and restore this at the end > of these test suites. > > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property
[ https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055578#comment-15055578 ] Apache Spark commented on SPARK-12311: -- User 'kiszk' has created a pull request for this issue: https://github.com/apache/spark/pull/10289 > [CORE] Restore previous value of "os.arch" property in test suites after > forcing to set specific value to "os.arch" property > > > Key: SPARK-12311 > URL: https://issues.apache.org/jira/browse/SPARK-12311 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the > specific value (e.g. "amd64") into system property "os.arch", they do not > restore the original value of "os.arch" after these test suites. This may > lead to failures in a test case that depends on architecture on other > platform rather than amd64. > They should save the original value of "os.arch" and restore this at the end > of these test suites. > > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12275) No plan for BroadcastHint in some condition
[ https://issues.apache.org/jira/browse/SPARK-12275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-12275: - Target Version/s: 1.5.3 > No plan for BroadcastHint in some condition > --- > > Key: SPARK-12275 > URL: https://issues.apache.org/jira/browse/SPARK-12275 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: yucai >Assignee: yucai > Fix For: 1.6.1, 2.0.0 > > > *Summary* > No plan for BroadcastHint is generated in some condition. > *Test Case* > {code} > val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value") > val parquetTempFile = > "%s/SPARK-_%d.parquet".format(System.getProperty("java.io.tmpdir"), > scala.util.Random.nextInt) > df1.write.parquet(parquetTempFile) > val pf1 = sqlContext.read.parquet(parquetTempFile) > #1. df1.join(broadcast(pf1)).count() > #2. broadcast(pf1).count() > {code} > *Result* > It will trigger assertion in QueryPlanner.scala, like below: > {code} > scala> df1.join(broadcast(pf1)).count() > java.lang.AssertionError: assertion failed: No plan for BroadcastHint > +- Relation[key#6,value#7] > ParquetRelation[hdfs://10.1.0.20:8020/tmp/SPARK-_1817830406.parquet] > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) > at > org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12275) No plan for BroadcastHint in some condition
[ https://issues.apache.org/jira/browse/SPARK-12275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-12275: - Fix Version/s: 2.0.0 1.6.1 > No plan for BroadcastHint in some condition > --- > > Key: SPARK-12275 > URL: https://issues.apache.org/jira/browse/SPARK-12275 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: yucai >Assignee: yucai > Fix For: 1.6.1, 2.0.0 > > > *Summary* > No plan for BroadcastHint is generated in some condition. > *Test Case* > {code} > val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value") > val parquetTempFile = > "%s/SPARK-_%d.parquet".format(System.getProperty("java.io.tmpdir"), > scala.util.Random.nextInt) > df1.write.parquet(parquetTempFile) > val pf1 = sqlContext.read.parquet(parquetTempFile) > #1. df1.join(broadcast(pf1)).count() > #2. broadcast(pf1).count() > {code} > *Result* > It will trigger assertion in QueryPlanner.scala, like below: > {code} > scala> df1.join(broadcast(pf1)).count() > java.lang.AssertionError: assertion failed: No plan for BroadcastHint > +- Relation[key#6,value#7] > ParquetRelation[hdfs://10.1.0.20:8020/tmp/SPARK-_1817830406.parquet] > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) > at > org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.
[ https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-12316: - Description: When application end, AM will clean the staging dir. But if the driver trigger to update the delegation token, it will can't find the right token file and then it will endless cycle call the method 'updateCredentialsIfRequired'. Then it lead to StackOverflowError. was: When application end, AM will clean the staging dir. But if the driver trigger to update the delegation token, it will can't find the right token file and then it will endless cycle call the method 'updateCredentialsIfRequired' > Stack overflow with endless call of `Delegation token thread` when > application end. > --- > > Key: SPARK-12316 > URL: https://issues.apache.org/jira/browse/SPARK-12316 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0 >Reporter: SaintBacchus > > When application end, AM will clean the staging dir. > But if the driver trigger to update the delegation token, it will can't find > the right token file and then it will endless cycle call the method > 'updateCredentialsIfRequired'. > Then it lead to StackOverflowError. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12275) No plan for BroadcastHint in some condition
[ https://issues.apache.org/jira/browse/SPARK-12275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-12275: - Assignee: yucai > No plan for BroadcastHint in some condition > --- > > Key: SPARK-12275 > URL: https://issues.apache.org/jira/browse/SPARK-12275 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: yucai >Assignee: yucai > > *Summary* > No plan for BroadcastHint is generated in some condition. > *Test Case* > {code} > val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value") > val parquetTempFile = > "%s/SPARK-_%d.parquet".format(System.getProperty("java.io.tmpdir"), > scala.util.Random.nextInt) > df1.write.parquet(parquetTempFile) > val pf1 = sqlContext.read.parquet(parquetTempFile) > #1. df1.join(broadcast(pf1)).count() > #2. broadcast(pf1).count() > {code} > *Result* > It will trigger assertion in QueryPlanner.scala, like below: > {code} > scala> df1.join(broadcast(pf1)).count() > java.lang.AssertionError: assertion failed: No plan for BroadcastHint > +- Relation[key#6,value#7] > ParquetRelation[hdfs://10.1.0.20:8020/tmp/SPARK-_1817830406.parquet] > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) > at > org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.
SaintBacchus created SPARK-12316: Summary: Stack overflow with endless call of `Delegation token thread` when application end. Key: SPARK-12316 URL: https://issues.apache.org/jira/browse/SPARK-12316 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.6.0 Reporter: SaintBacchus When application end, AM will clean the staging dir. But if the driver trigger to update the delegation token, it will can't find the right token file and then it will endless cycle call the method 'updateCredentialsIfRequired' -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12213) Query with only one distinct should not having on expand
[ https://issues.apache.org/jira/browse/SPARK-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-12213. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10228 [https://github.com/apache/spark/pull/10228] > Query with only one distinct should not having on expand > > > Key: SPARK-12213 > URL: https://issues.apache.org/jira/browse/SPARK-12213 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > > Expand will double the number of records, slow down projection and > aggregation, it's better to generate a plan without Expand for a query with > only one distinct (for example, ss_max in TPCDS) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12176) SparkLauncher's setConf() does not support configs containing spaces
[ https://issues.apache.org/jira/browse/SPARK-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055508#comment-15055508 ] Saisai Shao commented on SPARK-12176: - It is OK in my local test against latest master branch, seems no such issue. Probably this issue only lies in the old version of Spark. > SparkLauncher's setConf() does not support configs containing spaces > > > Key: SPARK-12176 > URL: https://issues.apache.org/jira/browse/SPARK-12176 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2 > Environment: All >Reporter: Yuhang Chen >Priority: Minor > > The spark-submit uses '--conf K=V' pattern for setting configs. According to > the docs, if the 'V' you set has spaces in it, the whole 'K=V' parts should > be wrapped with quotes. > However, the SparkLauncher (org.apache.spark.launcher.SparkLauncher) would > not do that wrapping for you, and there is no chance for wrapping by yourself > with the API it provides. > For example, I want to add {{-XX:+PrintGCDetails -XX:+PrintGCTimeStamps}} for > executors (spark.executor.extraJavaOptions), and the conf contains a space in > it. > For spark-submit, I should wrap the conf with quotes like this: > {code} > --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails > -XX:+PrintGCTimeStamps" > {code} > But when I use the setConf() API of SparkLauncher, I write code like this: > {code} > launcher.setConf("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails > -XX:+PrintGCTimeStamps"); > {code} > Now, SparkLauncher uses Java's ProcessBuilder to start a sub-process, in > which the spark-submit is finally executed. And it turns out that the final > command is like this; > {code} > --conf spark.executor.extraJavaOptions=-XX:+PrintGCDetails > -XX:+PrintGCTimeStamps > {code} > See? the quotes are gone, and the job counld not be launched with this > command. > Then I checked up the source, all confs are stored in a Map before generating > launching commands. Thus. my advice is checking all values of the conf Map > and do wrapping during command building. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12281) Fixed potential exceptions when exiting a local cluster.
[ https://issues.apache.org/jira/browse/SPARK-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-12281. -- Resolution: Fixed Assignee: Shixiong Zhu (was: Apache Spark) Fix Version/s: 2.0.0 1.6.1 > Fixed potential exceptions when exiting a local cluster. > > > Key: SPARK-12281 > URL: https://issues.apache.org/jira/browse/SPARK-12281 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 1.6.1, 2.0.0 > > > Fixed the following potential exceptions when exiting a local cluster. > {code} > java.lang.AssertionError: assertion failed: executor 4 state transfer from > RUNNING to RUNNING is illegal > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260) > at > org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > {code} > java.lang.IllegalStateException: Shutdown hooks cannot be modified during > shutdown. > at > org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246) > at > org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191) > at > org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180) > at > org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73) > at > org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474) > at > org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12180) DataFrame.join() in PySpark gives misleading exception when column name exists on both side
[ https://issues.apache.org/jira/browse/SPARK-12180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055429#comment-15055429 ] Jeff Zhang commented on SPARK-12180: Could you paste your code ? It works fine for me to join 2 dataframes with common fields {code} In [12]: df1.join(df2, df1.name==df2.name) // both has column "id" except the join key "name" Out[12]: DataFrame[id: bigint, name: string, id: bigint, name: bigint] {code} > DataFrame.join() in PySpark gives misleading exception when column name > exists on both side > --- > > Key: SPARK-12180 > URL: https://issues.apache.org/jira/browse/SPARK-12180 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.5.2 >Reporter: Daniel Thomas > > When joining two DataFrames on a column 'session_uuid' I got the following > exception, because both DataFrames hat a column called 'at'. The exception is > misleading in the cause and in the column causing the problem. Renaming the > column fixed the exception. > --- > Py4JJavaError Traceback (most recent call last) > /Applications/spark-1.5.2-bin-hadoop2.4/python/pyspark/sql/utils.py in > deco(*a, **kw) > 35 try: > ---> 36 return f(*a, **kw) > 37 except py4j.protocol.Py4JJavaError as e: > /Applications/spark-1.5.2-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py > in get_return_value(answer, gateway_client, target_id, name) > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > Py4JJavaError: An error occurred while calling o484.join. > : org.apache.spark.sql.AnalysisException: resolved attribute(s) > session_uuid#3278 missing from > uuid_x#9078,total_session_sec#9115L,at#3248,session_uuid#9114,uuid#9117,at#9084 > in operator !Join Inner, Some((uuid_x#9078 = session_uuid#3278)); > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:37) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:154) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:49) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:103) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:49) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44) > at > org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:914) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:132) > at > org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154) > at org.apache.spark.sql.DataFrame.join(DataFrame.scala:553) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) > During handling of the above exception, another exception occurred: > AnalysisException Traceback (most recent call last) > in () > 1 sel_starts = starts.select('uuid', 'at').withColumnRenamed('uuid', > 'uuid_x')#.withColumnRenamed('at', 'at_x') > 2 sel_closes = closes.select('uuid', 'at', 'session_uuid', > 'total_session_sec') > > 3 start_close = sel_starts.join(sel_closes, sel_starts['uuid_x'] == > sel_closes['session_uuid']) > 4 start_close.cache() > 5 start_close.take(1) > /Applications/spark-1.5.2-bin-hadoop2.4/python/pyspark/sql/dataframe.py in > join(self, other, on, how) > 579 on = on[0] > 580 if how is None: > --> 581 jdf = self._jdf.join(other._jdf, on._jc, "inner") > 582 else: > 583 assert isinst
[jira] [Commented] (SPARK-12057) Prevent failure on corrupt JSON records
[ https://issues.apache.org/jira/browse/SPARK-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055414#comment-15055414 ] Apache Spark commented on SPARK-12057: -- User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/10288 > Prevent failure on corrupt JSON records > --- > > Key: SPARK-12057 > URL: https://issues.apache.org/jira/browse/SPARK-12057 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ian Macalinao >Priority: Minor > > Return failed record when a record cannot be parsed. Allows parsing of files > containing corrupt records of any form. Currently a corrupt record throws an > exception, causing the entire job to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12315) isnotnull operator not pushed down for JDBC datasource.
[ https://issues.apache.org/jira/browse/SPARK-12315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055358#comment-15055358 ] Apache Spark commented on SPARK-12315: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/10287 > isnotnull operator not pushed down for JDBC datasource. > --- > > Key: SPARK-12315 > URL: https://issues.apache.org/jira/browse/SPARK-12315 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Hyukjin Kwon > > {{IsNotNull}} filter is not being pushed down for JDBC datasource. > It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003 and > SQL:201x and I believe most databases support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12315) isnotnull operator not pushed down for JDBC datasource.
[ https://issues.apache.org/jira/browse/SPARK-12315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12315: Assignee: Apache Spark > isnotnull operator not pushed down for JDBC datasource. > --- > > Key: SPARK-12315 > URL: https://issues.apache.org/jira/browse/SPARK-12315 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark > > {{IsNotNull}} filter is not being pushed down for JDBC datasource. > It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003 and > SQL:201x and I believe most databases support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12315) isnotnull operator not pushed down for JDBC datasource.
[ https://issues.apache.org/jira/browse/SPARK-12315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12315: Assignee: (was: Apache Spark) > isnotnull operator not pushed down for JDBC datasource. > --- > > Key: SPARK-12315 > URL: https://issues.apache.org/jira/browse/SPARK-12315 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Hyukjin Kwon > > {{IsNotNull}} filter is not being pushed down for JDBC datasource. > It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003 and > SQL:201x and I believe most databases support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12314) isnull operator not pushed down for JDBC datasource.
[ https://issues.apache.org/jira/browse/SPARK-12314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055356#comment-15055356 ] Apache Spark commented on SPARK-12314: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/10286 > isnull operator not pushed down for JDBC datasource. > > > Key: SPARK-12314 > URL: https://issues.apache.org/jira/browse/SPARK-12314 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Hyukjin Kwon > > {{IsNull}} filter is not being pushed down for JDBC datasource. > It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003, SQL:201x > and I believe most databases support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12314) isnull operator not pushed down for JDBC datasource.
[ https://issues.apache.org/jira/browse/SPARK-12314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12314: Assignee: Apache Spark > isnull operator not pushed down for JDBC datasource. > > > Key: SPARK-12314 > URL: https://issues.apache.org/jira/browse/SPARK-12314 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark > > {{IsNull}} filter is not being pushed down for JDBC datasource. > It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003, SQL:201x > and I believe most databases support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12314) isnull operator not pushed down for JDBC datasource.
[ https://issues.apache.org/jira/browse/SPARK-12314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12314: Assignee: (was: Apache Spark) > isnull operator not pushed down for JDBC datasource. > > > Key: SPARK-12314 > URL: https://issues.apache.org/jira/browse/SPARK-12314 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Hyukjin Kwon > > {{IsNull}} filter is not being pushed down for JDBC datasource. > It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003, SQL:201x > and I believe most databases support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12315) isnotnull operator not pushed down for JDBC datasource.
[ https://issues.apache.org/jira/browse/SPARK-12315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055347#comment-15055347 ] Hyukjin Kwon commented on SPARK-12315: -- I will work on this. > isnotnull operator not pushed down for JDBC datasource. > --- > > Key: SPARK-12315 > URL: https://issues.apache.org/jira/browse/SPARK-12315 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Hyukjin Kwon > > {{IsNotNull}} filter is not being pushed down for JDBC datasource. > It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003 and > SQL:201x and I believe most databases support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12314) isnull operator not pushed down for JDBC datasource.
[ https://issues.apache.org/jira/browse/SPARK-12314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055344#comment-15055344 ] Hyukjin Kwon commented on SPARK-12314: -- I will work on this. > isnull operator not pushed down for JDBC datasource. > > > Key: SPARK-12314 > URL: https://issues.apache.org/jira/browse/SPARK-12314 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Hyukjin Kwon > > {{IsNull}} filter is not being pushed down for JDBC datasource. > It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003, SQL:201x > and I believe most databases support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12315) isnotnull operator not pushed down for JDBC datasource.
Hyukjin Kwon created SPARK-12315: Summary: isnotnull operator not pushed down for JDBC datasource. Key: SPARK-12315 URL: https://issues.apache.org/jira/browse/SPARK-12315 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.6.0 Reporter: Hyukjin Kwon {{IsNotNull}} filter is not being pushed down for JDBC datasource. It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003 and SQL:201x and I believe most databases support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12314) isnull operator not pushed down for JDBC datasource.
Hyukjin Kwon created SPARK-12314: Summary: isnull operator not pushed down for JDBC datasource. Key: SPARK-12314 URL: https://issues.apache.org/jira/browse/SPARK-12314 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.6.0 Reporter: Hyukjin Kwon {{IsNull}} filter is not being pushed down for JDBC datasource. It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003, SQL:201x and I believe most databases support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect
[ https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055256#comment-15055256 ] Apache Spark commented on SPARK-12288: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/10285 > Support UnsafeRow in Coalesce/Except/Intersect > -- > > Key: SPARK-12288 > URL: https://issues.apache.org/jira/browse/SPARK-12288 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect
[ https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12288: Assignee: (was: Apache Spark) > Support UnsafeRow in Coalesce/Except/Intersect > -- > > Key: SPARK-12288 > URL: https://issues.apache.org/jira/browse/SPARK-12288 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect
[ https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12288: Assignee: Apache Spark > Support UnsafeRow in Coalesce/Except/Intersect > -- > > Key: SPARK-12288 > URL: https://issues.apache.org/jira/browse/SPARK-12288 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect
[ https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-12288: Comment: was deleted (was: [~davies] It sounds like Except and Intersect can support UnsafeRow? We just need to add the following line to Except and Intersect {code} override def outputsUnsafeRows: Boolean = children.forall(_.outputsUnsafeRows) override def canProcessUnsafeRows: Boolean = true override def canProcessSafeRows: Boolean = true {code} Is my understanding correct? Thanks!) > Support UnsafeRow in Coalesce/Except/Intersect > -- > > Key: SPARK-12288 > URL: https://issues.apache.org/jira/browse/SPARK-12288 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous
[ https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055207#comment-15055207 ] Apache Spark commented on SPARK-12062: -- User 'BryanCutler' has created a pull request for this issue: https://github.com/apache/spark/pull/10284 > Master rebuilding historical SparkUI should be asynchronous > --- > > Key: SPARK-12062 > URL: https://issues.apache.org/jira/browse/SPARK-12062 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Bryan Cutler > > When a long-running application finishes, it takes a while (sometimes > minutes) to rebuild the SparkUI. However, in Master.scala this is currently > done within the RPC event loop, which runs only in 1 thread. Thus, in the > mean time no other applications can register with this master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous
[ https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055205#comment-15055205 ] Bryan Cutler commented on SPARK-12062: -- I read the past conversations discussing this in the related JIRAs and agree that would be a better approach to eventually remove this functionality from the master. I'll go ahead and post this PR I have ready and maybe it will be of some use in the meantime before SPARK-12299. > Master rebuilding historical SparkUI should be asynchronous > --- > > Key: SPARK-12062 > URL: https://issues.apache.org/jira/browse/SPARK-12062 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Bryan Cutler > > When a long-running application finishes, it takes a while (sometimes > minutes) to rebuild the SparkUI. However, in Master.scala this is currently > done within the RPC event loop, which runs only in 1 thread. Thus, in the > mean time no other applications can register with this master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12313) getPartitionsByFilter doesnt handle predicates on all / multiple Partition Columns
[ https://issues.apache.org/jira/browse/SPARK-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gobinathan SP updated SPARK-12313: -- Description: When enabled spark.sql.hive.metastorePartitionPruning, the getPartitionsByFilter is used For a table partitioned by p1 and p2, when triggered hc.sql("select col from tabl1 where p1='p1V' and p2= 'p2V' ") The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and col2= 'p2V' . On these cases the result is not returned. The number of returned rows is always zero. However, filter on a single column always works. Probalbly it doesn't come through this route I'm using Oracle for Metstore was: When enabled spark.sql.hive.metastorePartitionPruning, the getPartitionsByFilter is used For a table partitioned by p1 and p2, when triggered hc.sql("select col from tabl1 where p1='p1V' and p2= 'p2V' ") The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and col2= 'p2V' . On these cases the result is not returned. The number of returned rows is always zero. However, filter on a single column always works. Probalbly it doesn't come through this route > getPartitionsByFilter doesnt handle predicates on all / multiple Partition > Columns > -- > > Key: SPARK-12313 > URL: https://issues.apache.org/jira/browse/SPARK-12313 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Gobinathan SP >Priority: Minor > > When enabled spark.sql.hive.metastorePartitionPruning, the > getPartitionsByFilter is used > For a table partitioned by p1 and p2, when triggered hc.sql("select col > from tabl1 where p1='p1V' and p2= 'p2V' ") > The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' > and col2= 'p2V' . > On these cases the result is not returned. The number of returned rows is > always zero. > However, filter on a single column always works. Probalbly it doesn't come > through this route > I'm using Oracle for Metstore -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12313) getPartitionsByFilter doesnt handle predicates on all / multiple Partition Columns
[ https://issues.apache.org/jira/browse/SPARK-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gobinathan SP updated SPARK-12313: -- Description: When enabled spark.sql.hive.metastorePartitionPruning, the getPartitionsByFilter is used For a table partitioned by p1 and p2, when triggered hc.sql("select col from tabl1 where p1='p1V' and p2= 'p2V' ") The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and col2= 'p2V' . On these cases the result is not returned. The number of returned rows is always zero. However, filter on a single column always works. Probalbly it doesn't come through this route was: When enabled spark.sql.hive.metastorePartitionPruning, the getPartitionsByFilter is used For a table partitioned by p1 and p2, when triggered hc.sql("select col from tabl1 where p1='p1V' and p2= 'p2V' ") The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and col2= 'p2V' . On these cases the result is not returned. The number of returned rows is always zero. However, filter on a single column always works. Probalble it doesn't come through this route > getPartitionsByFilter doesnt handle predicates on all / multiple Partition > Columns > -- > > Key: SPARK-12313 > URL: https://issues.apache.org/jira/browse/SPARK-12313 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Gobinathan SP >Priority: Minor > > When enabled spark.sql.hive.metastorePartitionPruning, the > getPartitionsByFilter is used > For a table partitioned by p1 and p2, when triggered hc.sql("select col > from tabl1 where p1='p1V' and p2= 'p2V' ") > The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' > and col2= 'p2V' . > On these cases the result is not returned. The number of returned rows is > always zero. > However, filter on a single column always works. Probalbly it doesn't come > through this route -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12313) getPartitionsByFilter doesnt handle predicates on all / multiple Partition Columns
[ https://issues.apache.org/jira/browse/SPARK-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gobinathan SP updated SPARK-12313: -- Description: When enabled spark.sql.hive.metastorePartitionPruning, the getPartitionsByFilter is used For a table partitioned by p1 and p2, when triggered hc.sql("select col from tabl1 where p1='p1V' and p2= 'p2V' ") The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and col2= 'p2V' . On these cases the result is not returned. The number of returned rows is always zero. However, filter on a single column always works. Probalble it doesn't come through this route was: When enabled spark.sql.hive.metastorePartitionPruning, the getPartitionsByFilter is used For a table partitioned by p1 and p2, when triggered hc.sql("select col from tabl1 where p1='p1V' and col2= 'p2V' ") The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and col2= 'p2V' . On these cases the result is not returned. The number of returned rows is always zero. However, filter on a single column always works. Probalble it doesn't come through this route > getPartitionsByFilter doesnt handle predicates on all / multiple Partition > Columns > -- > > Key: SPARK-12313 > URL: https://issues.apache.org/jira/browse/SPARK-12313 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Gobinathan SP >Priority: Minor > > When enabled spark.sql.hive.metastorePartitionPruning, the > getPartitionsByFilter is used > For a table partitioned by p1 and p2, when triggered hc.sql("select col > from tabl1 where p1='p1V' and p2= 'p2V' ") > The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' > and col2= 'p2V' . > On these cases the result is not returned. The number of returned rows is > always zero. > However, filter on a single column always works. Probalble it doesn't come > through this route -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12313) getPartitionsByFilter doesnt handle predicates on all / multiple Partition Columns
Gobinathan SP created SPARK-12313: - Summary: getPartitionsByFilter doesnt handle predicates on all / multiple Partition Columns Key: SPARK-12313 URL: https://issues.apache.org/jira/browse/SPARK-12313 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.1 Reporter: Gobinathan SP Priority: Minor When enabled spark.sql.hive.metastorePartitionPruning, the getPartitionsByFilter is used For a table partitioned by p1 and p2, when triggered hc.sql("select col from tabl1 where p1='p1V' and col2= 'p2V' ") The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and col2= 'p2V' . On these cases the result is not returned. The number of returned rows is always zero. However, filter on a single column always works. Probalble it doesn't come through this route -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11410) Add a DataFrame API that provides functionality similar to HiveQL's DISTRIBUTE BY
[ https://issues.apache.org/jira/browse/SPARK-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055198#comment-15055198 ] Yin Huai commented on SPARK-11410: -- oh, i see. This is the table partitioning mechanism. If you use partitionBy before writing this table, we will understand this table is partitioned by column {{column}} and can skip unnecessary partitions when scan the table. The jira is actually for another feature, which lets users to control how to shuffle data by using the hash value of given columns. > Add a DataFrame API that provides functionality similar to HiveQL's > DISTRIBUTE BY > - > > Key: SPARK-11410 > URL: https://issues.apache.org/jira/browse/SPARK-11410 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.1 >Reporter: Nong Li >Assignee: Nong Li > Fix For: 1.6.0 > > > DISTRIBUTE BY allows the user to control the partitioning and ordering of a > data set which can be very useful for some applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11410) Add a DataFrame API that provides functionality similar to HiveQL's DISTRIBUTE BY
[ https://issues.apache.org/jira/browse/SPARK-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055189#comment-15055189 ] Maciej Bryński commented on SPARK-11410: And what about such example. We can partition table by column. Then we run the query: {code} select * from table where column = value {code} In this case Spark should scan only one partition. PS. Is partitioning is saved in parquet format ? > Add a DataFrame API that provides functionality similar to HiveQL's > DISTRIBUTE BY > - > > Key: SPARK-11410 > URL: https://issues.apache.org/jira/browse/SPARK-11410 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.1 >Reporter: Nong Li >Assignee: Nong Li > Fix For: 1.6.0 > > > DISTRIBUTE BY allows the user to control the partitioning and ordering of a > data set which can be very useful for some applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-11410) Add a DataFrame API that provides functionality similar to HiveQL's DISTRIBUTE BY
[ https://issues.apache.org/jira/browse/SPARK-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055189#comment-15055189 ] Maciej Bryński edited comment on SPARK-11410 at 12/13/15 10:16 PM: --- And what about such example. We can partition table by column. Then we run the query: {code} select * from table where column = value {code} In this case Spark should scan only one partition. PS. Is partitioning is saved in parquet format ? was (Author: maver1ck): And what about such example. We can partition table by column. Then we run the query: {code} select * from table where column = value {code} In this case Spark should scan only one partition. PS. Is partitioning is saved in parquet format ? > Add a DataFrame API that provides functionality similar to HiveQL's > DISTRIBUTE BY > - > > Key: SPARK-11410 > URL: https://issues.apache.org/jira/browse/SPARK-11410 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.1 >Reporter: Nong Li >Assignee: Nong Li > Fix For: 1.6.0 > > > DISTRIBUTE BY allows the user to control the partitioning and ordering of a > data set which can be very useful for some applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11410) Add a DataFrame API that provides functionality similar to HiveQL's DISTRIBUTE BY
[ https://issues.apache.org/jira/browse/SPARK-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055188#comment-15055188 ] Yin Huai commented on SPARK-11410: -- Yes, we do. For example, if you cache the table after call repartition, spark sql understands that the table has been partitioned and will try to avoid of shuffling if a query requires to use the same column(s) to shuffle data. In future, we will store the partitioning info in metastore. So, users can pre-shuffle data or co-partition their tables. > Add a DataFrame API that provides functionality similar to HiveQL's > DISTRIBUTE BY > - > > Key: SPARK-11410 > URL: https://issues.apache.org/jira/browse/SPARK-11410 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.1 >Reporter: Nong Li >Assignee: Nong Li > Fix For: 1.6.0 > > > DISTRIBUTE BY allows the user to control the partitioning and ordering of a > data set which can be very useful for some applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11410) Add a DataFrame API that provides functionality similar to HiveQL's DISTRIBUTE BY
[ https://issues.apache.org/jira/browse/SPARK-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055182#comment-15055182 ] Maciej Bryński commented on SPARK-11410: Is it possible for Spark to use information about partitioning to optimize queries ? > Add a DataFrame API that provides functionality similar to HiveQL's > DISTRIBUTE BY > - > > Key: SPARK-11410 > URL: https://issues.apache.org/jira/browse/SPARK-11410 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.1 >Reporter: Nong Li >Assignee: Nong Li > Fix For: 1.6.0 > > > DISTRIBUTE BY allows the user to control the partitioning and ordering of a > data set which can be very useful for some applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect
[ https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055110#comment-15055110 ] Xiao Li edited comment on SPARK-12288 at 12/13/15 9:49 PM: --- [~davies] It sounds like Except and Intersect can support UnsafeRow? We just need to add the following line to Except and Intersect {code} override def outputsUnsafeRows: Boolean = children.forall(_.outputsUnsafeRows) override def canProcessUnsafeRows: Boolean = true override def canProcessSafeRows: Boolean = true {code} Is my understanding correct? Thanks! was (Author: smilegator): [~davies] It sounds like Except and Intersect can support UnsafeRow? We just need to add the following line to Except and Intersect {code} override def canProcessUnsafeRows: Boolean = true {code} Is my understanding correct? Thanks! > Support UnsafeRow in Coalesce/Except/Intersect > -- > > Key: SPARK-12288 > URL: https://issues.apache.org/jira/browse/SPARK-12288 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-10798) JsonMappingException with Spark Context Parallelize
[ https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dev Lakhani closed SPARK-10798. --- Resolution: Cannot Reproduce > JsonMappingException with Spark Context Parallelize > --- > > Key: SPARK-10798 > URL: https://issues.apache.org/jira/browse/SPARK-10798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 > Environment: Linux, Java 1.8.45 >Reporter: Dev Lakhani > > When trying to create an RDD of Rows using a Java Spark Context and if I > serialize the rows with Kryo first, the sparkContext fails. > byte[] data= Kryo.serialize(List) > List fromKryoRows=Kryo.unserialize(data) > List rows= new Vector(); //using a new set of data. > rows.add(RowFactory.create("test")); > javaSparkContext.parallelize(rows); > OR > javaSparkContext.parallelize(fromKryoRows); //using deserialized rows > I get : > com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class > scala.Tuple2) (through reference chain: > org.apache.spark.rdd.RDDOperationScope["parent"]) >at > com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210) >at > com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177) >at > com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187) >at > com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647) >at > com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152) >at > com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128) >at > com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881) >at > com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338) >at > org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50) >at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141) >at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) >at > org.apache.spark.SparkContext.withScope(SparkContext.scala:700) >at > org.apache.spark.SparkContext.parallelize(SparkContext.scala:714) >at > org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145) >at > org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157) >... > Caused by: scala.MatchError: (None,None) (of class scala.Tuple2) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) >at scala.Option.getOrElse(Option.scala:120) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22) >at > com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505) >at > com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128) >at > com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639) >... 19 more > I've tried updating jackson module scala to 2.6.1 but same issue. This > happens in local mode with java 1.8_45. I searched the web and this Jira for > similar issues but found nothing of interest. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10798) JsonMappingException with Spark Context Parallelize
[ https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055130#comment-15055130 ] Dev Lakhani commented on SPARK-10798: - byte[] data= Kryo.serialize(List) This is just shorthand for new Kryo().serialize(). I think this issue was a classpath issue, I was not able to reproduce it, but if it reappears I will re-open it. > JsonMappingException with Spark Context Parallelize > --- > > Key: SPARK-10798 > URL: https://issues.apache.org/jira/browse/SPARK-10798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 > Environment: Linux, Java 1.8.45 >Reporter: Dev Lakhani > > When trying to create an RDD of Rows using a Java Spark Context and if I > serialize the rows with Kryo first, the sparkContext fails. > byte[] data= Kryo.serialize(List) > List fromKryoRows=Kryo.unserialize(data) > List rows= new Vector(); //using a new set of data. > rows.add(RowFactory.create("test")); > javaSparkContext.parallelize(rows); > OR > javaSparkContext.parallelize(fromKryoRows); //using deserialized rows > I get : > com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class > scala.Tuple2) (through reference chain: > org.apache.spark.rdd.RDDOperationScope["parent"]) >at > com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210) >at > com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177) >at > com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187) >at > com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647) >at > com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152) >at > com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128) >at > com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881) >at > com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338) >at > org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50) >at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141) >at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) >at > org.apache.spark.SparkContext.withScope(SparkContext.scala:700) >at > org.apache.spark.SparkContext.parallelize(SparkContext.scala:714) >at > org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145) >at > org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157) >... > Caused by: scala.MatchError: (None,None) (of class scala.Tuple2) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) >at scala.Option.getOrElse(Option.scala:120) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22) >at > com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505) >at > com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128) >at > com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639) >... 19 more > I've tried updating jackson module scala to 2.6.1 but same issue. This > happens in local mode with java 1.8_45. I searched the web and this Jira for > similar issues but found nothing of interest. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect
[ https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055110#comment-15055110 ] Xiao Li edited comment on SPARK-12288 at 12/13/15 8:27 PM: --- [~davies] It sounds like Except and Intersect can support UnsafeRow? We just need to add the following line to Except and Intersect {code} override def canProcessUnsafeRows: Boolean = true {code} Is my understanding correct? Thanks! was (Author: smilegator): [~davies] It sounds like Except and Intersect can support UnsafeRow? We just need to add the following line to Except and Intersect {code} override def canProcessUnsafeRows: Boolean = true {code} Thanks! > Support UnsafeRow in Coalesce/Except/Intersect > -- > > Key: SPARK-12288 > URL: https://issues.apache.org/jira/browse/SPARK-12288 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect
[ https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055110#comment-15055110 ] Xiao Li commented on SPARK-12288: - [~davies] It sounds like Except and Intersect can support UnsafeRow? We just need to add the following line to Except and Intersect {code} override def canProcessUnsafeRows: Boolean = true {code} Thanks! > Support UnsafeRow in Coalesce/Except/Intersect > -- > > Key: SPARK-12288 > URL: https://issues.apache.org/jira/browse/SPARK-12288 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property
[ https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055073#comment-15055073 ] Sean Owen commented on SPARK-12311: --- LGTM, feel free to make a pull request. > [CORE] Restore previous value of "os.arch" property in test suites after > forcing to set specific value to "os.arch" property > > > Key: SPARK-12311 > URL: https://issues.apache.org/jira/browse/SPARK-12311 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the > specific value (e.g. "amd64") into system property "os.arch", they do not > restore the original value of "os.arch" after these test suites. This may > lead to failures in a test case that depends on architecture on other > platform rather than amd64. > They should save the original value of "os.arch" and restore this at the end > of these test suites. > > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12312) JDBC connection to Kerberos secured databases fails on remote executors
[ https://issues.apache.org/jira/browse/SPARK-12312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nabacg updated SPARK-12312: --- Description: When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it. This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues. was:When loading DataFrames from JDBC datasource with Kerberos authentication (SQL Server, Oracle), remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it. > JDBC connection to Kerberos secured databases fails on remote executors > --- > > Key: SPARK-12312 > URL: https://issues.apache.org/jira/browse/SPARK-12312 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2 >Reporter: nabacg >Priority: Minor > > When loading DataFrames from JDBC datasource with Kerberos authentication, > remote executors (yarn-client/cluster etc. modes) fail to establish a > connection due to lack of Kerberos ticket or ability to generate it. > This is a real issue when trying to ingest data from kerberized data sources > (SQL Server, Oracle) in enterprise environment where exposing simple > authentication access is not an option due to IT policy issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12312) JDBC connection to Kerberos secured databases fails on remote executors
nabacg created SPARK-12312: -- Summary: JDBC connection to Kerberos secured databases fails on remote executors Key: SPARK-12312 URL: https://issues.apache.org/jira/browse/SPARK-12312 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.2 Reporter: nabacg Priority: Minor When loading DataFrames from JDBC datasource with Kerberos authentication (SQL Server, Oracle), remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12293) Support UnsafeRow in LocalTableScan
[ https://issues.apache.org/jira/browse/SPARK-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12293: Assignee: Apache Spark > Support UnsafeRow in LocalTableScan > --- > > Key: SPARK-12293 > URL: https://issues.apache.org/jira/browse/SPARK-12293 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12293) Support UnsafeRow in LocalTableScan
[ https://issues.apache.org/jira/browse/SPARK-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12293: Assignee: (was: Apache Spark) > Support UnsafeRow in LocalTableScan > --- > > Key: SPARK-12293 > URL: https://issues.apache.org/jira/browse/SPARK-12293 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12293) Support UnsafeRow in LocalTableScan
[ https://issues.apache.org/jira/browse/SPARK-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12293: Assignee: Apache Spark > Support UnsafeRow in LocalTableScan > --- > > Key: SPARK-12293 > URL: https://issues.apache.org/jira/browse/SPARK-12293 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12293) Support UnsafeRow in LocalTableScan
[ https://issues.apache.org/jira/browse/SPARK-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055019#comment-15055019 ] Apache Spark commented on SPARK-12293: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/10283 > Support UnsafeRow in LocalTableScan > --- > > Key: SPARK-12293 > URL: https://issues.apache.org/jira/browse/SPARK-12293 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property
[ https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-12311: - Description: Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the specific value (e.g. "amd64") into system property "os.arch", they do not restore the original value of "os.arch" after these test suites. This may lead to failures in a test case that depends on architecture on other platform rather than amd64. They should save the original value of "os.arch" and restore this at the end of these test suites. https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala was: Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the specific value (e.g. "amd64") into system property "os.arch", they do not restore the original value of "os.arch" after these test suites. This may lead to failures in test cases that depends on architecture on other platform rather than amd64. They should save the original value of "os.arch" and restore this at the end of these test suites. https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala > [CORE] Restore previous value of "os.arch" property in test suites after > forcing to set specific value to "os.arch" property > > > Key: SPARK-12311 > URL: https://issues.apache.org/jira/browse/SPARK-12311 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the > specific value (e.g. "amd64") into system property "os.arch", they do not > restore the original value of "os.arch" after these test suites. This may > lead to failures in a test case that depends on architecture on other > platform rather than amd64. > They should save the original value of "os.arch" and restore this at the end > of these test suites. > > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property
[ https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-12311: - Summary: [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property (was: [CORE] Restore previous "os.arch" property in test suites after forcing to set specific value to "os.arch" property) > [CORE] Restore previous value of "os.arch" property in test suites after > forcing to set specific value to "os.arch" property > > > Key: SPARK-12311 > URL: https://issues.apache.org/jira/browse/SPARK-12311 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the > specific value (e.g. "amd64") into system property "os.arch", they do not > restore the original value of "os.arch" after these test suites. This may > lead to failures in test cases that depends on architecture on other platform > rather than amd64. > They should save the original value of "os.arch" and restore this at the end > of these test suites. > > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12311) [CORE] Restore previous "os.arch" property in test suites after forcing to set specific value to "os.arch" property
Kazuaki Ishizaki created SPARK-12311: Summary: [CORE] Restore previous "os.arch" property in test suites after forcing to set specific value to "os.arch" property Key: SPARK-12311 URL: https://issues.apache.org/jira/browse/SPARK-12311 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.5.2 Reporter: Kazuaki Ishizaki Priority: Minor Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the specific value (e.g. "amd64") into system property "os.arch", they do not restore the original value of "os.arch" after these test suites. This may lead to failures in test cases that depends on architecture on other platform rather than amd64. They should save the original value of "os.arch" and restore this at the end of these test suites. https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12306) Add an option to ignore BlockRDD partition data loss
[ https://issues.apache.org/jira/browse/SPARK-12306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054995#comment-15054995 ] Liwei Lin commented on SPARK-12306: --- Sorry for the inconvenience; I've added some detailed description, [~srowen] would you mind reopen it? Thanks. :-) > Add an option to ignore BlockRDD partition data loss > > > Key: SPARK-12306 > URL: https://issues.apache.org/jira/browse/SPARK-12306 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.5.2 >Reporter: Liwei Lin > > Currently in Spark Streaming, a Receiver stores the received data into some > BlockManager and then later the data will be used by a BlockRDD. If this > BlockManager were to lost because of some failure, then this BlockRDD would > throw a SparkException saying "Could not compute split, block not found". > In most cases this is the right thing to do. However, in a streaming scenario > where it can tolerant small pieces of data loss, maybe just move on silently > -- instead of throwing an exception -- is more preferable. > This issue proposes to add such an "spark.streaming.ignoreBlockNotFound" > option, which defaults to false, to tell whether to throw an exception or > just move on when a block is not found. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12306) Add an option to ignore BlockRDD partition data loss
[ https://issues.apache.org/jira/browse/SPARK-12306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated SPARK-12306: -- Description: Currently in Spark Streaming, a Receiver stores the received data into some BlockManager and then later the data will be used by a BlockRDD. If this BlockManager were to lost because of some failure, then this BlockRDD would throw a SparkException saying "Could not compute split, block not found". In most cases this is the right thing to do. However, in a streaming scenario where it can tolerant small pieces of data loss, maybe just move on silently -- instead of throwing an exception -- is more preferable. This issue proposes to add such an "spark.streaming.ignoreBlockNotFound" option, which defaults to false, to tell whether to throw an exception or just move on when a block is not found. was: Currently in Spark Streaming, a Receiver stores the received data into some BlockManager and then later the data will be used by a BlockRDD. If this BlockManager were to lost because of some failure, then this BlockRDD would throw a SparkException with the "Could not compute split, block not found" message. In most cases this is the right thing to do. But in a streaming scenario where it can tolerant small pieces of data loss, maybe just move on silently -- instead of throwing an exception -- is more preferable. This issue proposes to add such an "spark.streaming.ignoreBlockNotFound" option, which defaults to false, the tell whether to throw an exception or just move on when a block is not found. > Add an option to ignore BlockRDD partition data loss > > > Key: SPARK-12306 > URL: https://issues.apache.org/jira/browse/SPARK-12306 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.5.2 >Reporter: Liwei Lin > > Currently in Spark Streaming, a Receiver stores the received data into some > BlockManager and then later the data will be used by a BlockRDD. If this > BlockManager were to lost because of some failure, then this BlockRDD would > throw a SparkException saying "Could not compute split, block not found". > In most cases this is the right thing to do. However, in a streaming scenario > where it can tolerant small pieces of data loss, maybe just move on silently > -- instead of throwing an exception -- is more preferable. > This issue proposes to add such an "spark.streaming.ignoreBlockNotFound" > option, which defaults to false, to tell whether to throw an exception or > just move on when a block is not found. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12306) Add an option to ignore BlockRDD partition data loss
[ https://issues.apache.org/jira/browse/SPARK-12306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated SPARK-12306: -- Description: Currently in Spark Streaming, a Receiver stores the received data into some BlockManager and then later the data will be used by a BlockRDD. If this BlockManager were to lost because of some failure, then this BlockRDD would throw a SparkException with the "Could not compute split, block not found" message. In most cases this is the right thing to do. But in a streaming scenario where it can tolerant small pieces of data loss, maybe just move on silently -- instead of throwing an exception -- is more preferable. This issue proposes to add such an "spark.streaming.ignoreBlockNotFound" option, which defaults to false, the tell whether to throw an exception or just move on when a block is not found. > Add an option to ignore BlockRDD partition data loss > > > Key: SPARK-12306 > URL: https://issues.apache.org/jira/browse/SPARK-12306 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.5.2 >Reporter: Liwei Lin > > Currently in Spark Streaming, a Receiver stores the received data into some > BlockManager and then later the data will be used by a BlockRDD. If this > BlockManager were to lost because of some failure, then this BlockRDD would > throw a SparkException with the "Could not compute split, block not found" > message. > In most cases this is the right thing to do. But in a streaming scenario > where it can tolerant small pieces of data loss, maybe just move on silently > -- instead of throwing an exception -- is more preferable. > This issue proposes to add such an "spark.streaming.ignoreBlockNotFound" > option, which defaults to false, the tell whether to throw an exception or > just move on when a block is not found. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12305) Add Receiver scheduling info onto Spark Streaming web UI
[ https://issues.apache.org/jira/browse/SPARK-12305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054983#comment-15054983 ] Liwei Lin commented on SPARK-12305: --- Sorry for the inconvenience; I've added some detailed description, [~srowen] would you mind reopen it? Thanks. :-) > Add Receiver scheduling info onto Spark Streaming web UI > > > Key: SPARK-12305 > URL: https://issues.apache.org/jira/browse/SPARK-12305 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.5.2 >Reporter: Liwei Lin >Priority: Minor > > Spark 1.5 has added better Receiver scheduling support, via which users can > deploy Receivers to certain Executors in a way they wish. > However, neither 'Receiver.preferredLocations' info nor the candidate > Executors info are displayed on the web UI. Then when Receivers are not > scheduled in the way users have specified, it's non-trivial for the users to > find out why. > This issue proposes to add Receiver scheduling info, including > 'Receiver.preferredLocations' info as well as the candidate Executors info, > onto Spark Streaming web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12305) Add Receiver scheduling info onto Spark Streaming web UI
[ https://issues.apache.org/jira/browse/SPARK-12305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated SPARK-12305: -- Description: Spark 1.5 has added better Receiver scheduling support, via which users can deploy Receivers to certain Executors in a way they wish. However, neither 'Receiver.preferredLocations' info nor the candidate Executors info are displayed on the web UI. Then when Receivers are not scheduled in the way users have specified, it's non-trivial for the users to find out why. This issue proposes to add Receiver scheduling info, including 'Receiver.preferredLocations' info as well as the candidate Executors info, onto Spark Streaming web UI. was:Spark 1.5 has added better Receiver scheduling support, > Add Receiver scheduling info onto Spark Streaming web UI > > > Key: SPARK-12305 > URL: https://issues.apache.org/jira/browse/SPARK-12305 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.5.2 >Reporter: Liwei Lin >Priority: Minor > > Spark 1.5 has added better Receiver scheduling support, via which users can > deploy Receivers to certain Executors in a way they wish. > However, neither 'Receiver.preferredLocations' info nor the candidate > Executors info are displayed on the web UI. Then when Receivers are not > scheduled in the way users have specified, it's non-trivial for the users to > find out why. > This issue proposes to add Receiver scheduling info, including > 'Receiver.preferredLocations' info as well as the candidate Executors info, > onto Spark Streaming web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12305) Add Receiver scheduling info onto Spark Streaming web UI
[ https://issues.apache.org/jira/browse/SPARK-12305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated SPARK-12305: -- Description: Spark 1.5 has added better Receiver scheduling support, > Add Receiver scheduling info onto Spark Streaming web UI > > > Key: SPARK-12305 > URL: https://issues.apache.org/jira/browse/SPARK-12305 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.5.2 >Reporter: Liwei Lin >Priority: Minor > > Spark 1.5 has added better Receiver scheduling support, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12310) Add write.json and write.parquet for SparkR
[ https://issues.apache.org/jira/browse/SPARK-12310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12310: Assignee: Apache Spark > Add write.json and write.parquet for SparkR > --- > > Key: SPARK-12310 > URL: https://issues.apache.org/jira/browse/SPARK-12310 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Yanbo Liang >Assignee: Apache Spark > > Add write.json and write.parquet for SparkR -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12309) Use sqlContext from MLlibTestSparkContext for spark.ml test suites
[ https://issues.apache.org/jira/browse/SPARK-12309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12309: Assignee: (was: Apache Spark) > Use sqlContext from MLlibTestSparkContext for spark.ml test suites > -- > > Key: SPARK-12309 > URL: https://issues.apache.org/jira/browse/SPARK-12309 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang > > Use sqlContext from MLlibTestSparkContext rather than creating new one for > spark.ml test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12309) Use sqlContext from MLlibTestSparkContext for spark.ml test suites
[ https://issues.apache.org/jira/browse/SPARK-12309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12309: Assignee: Apache Spark > Use sqlContext from MLlibTestSparkContext for spark.ml test suites > -- > > Key: SPARK-12309 > URL: https://issues.apache.org/jira/browse/SPARK-12309 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Assignee: Apache Spark > > Use sqlContext from MLlibTestSparkContext rather than creating new one for > spark.ml test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12306) Add an option to ignore BlockRDD partition data loss
[ https://issues.apache.org/jira/browse/SPARK-12306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-12306. --- Resolution: Invalid > Add an option to ignore BlockRDD partition data loss > > > Key: SPARK-12306 > URL: https://issues.apache.org/jira/browse/SPARK-12306 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.5.2 >Reporter: Liwei Lin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12282) Document spark.jars
[ https://issues.apache.org/jira/browse/SPARK-12282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054927#comment-15054927 ] Sean Owen commented on SPARK-12282: --- Yeah, I suspect it happens to work fine, since that's what --jars sets too. I don't think it's tested/guaranteed as an API for all modes. I can see wanting to make everything a conf value, but lots of things aren't at this stage anyway. (And if you really want to, you can set it this way if you're OK with it maybe not working in a future version.) I don't think it adds up to a need to expose this property. > Document spark.jars > --- > > Key: SPARK-12282 > URL: https://issues.apache.org/jira/browse/SPARK-12282 > Project: Spark > Issue Type: Documentation > Components: Documentation >Reporter: Justin Bailey >Priority: Trivial > > The spark.jars property (as implemented in SparkSubmit.scala, > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L516) > is not documented anywhere, and should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12305) Add Receiver scheduling info onto Spark Streaming web UI
[ https://issues.apache.org/jira/browse/SPARK-12305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-12305. --- Resolution: Invalid > Add Receiver scheduling info onto Spark Streaming web UI > > > Key: SPARK-12305 > URL: https://issues.apache.org/jira/browse/SPARK-12305 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.5.2 >Reporter: Liwei Lin >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12238) s/Advanced sources/External Sources in docs.
[ https://issues.apache.org/jira/browse/SPARK-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-12238. --- Resolution: Won't Fix > s/Advanced sources/External Sources in docs. > > > Key: SPARK-12238 > URL: https://issues.apache.org/jira/browse/SPARK-12238 > Project: Spark > Issue Type: Improvement > Components: Documentation, Streaming >Reporter: Prashant Sharma > > While reading the docs, I felt reading as external sources(instead of > Advanced sources) seemed more appropriate as in they belong outside streaming > core project. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12260) Graceful Shutdown with In-Memory State
[ https://issues.apache.org/jira/browse/SPARK-12260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054923#comment-15054923 ] Sean Owen commented on SPARK-12260: --- I'm asking what you can't do with updateStateByKey that you need to do? This sounds pretty app-specific, and something you can maintain in your own app or repo to start. > Graceful Shutdown with In-Memory State > -- > > Key: SPARK-12260 > URL: https://issues.apache.org/jira/browse/SPARK-12260 > Project: Spark > Issue Type: New Feature > Components: Streaming >Reporter: Mao, Wei > Labels: streaming > > Users often stop and restart their streaming jobs for tasks such as > maintenance, software upgrades or even application logic updates. When a job > re-starts it should pick up where it left off i.e. any state information that > existed when the job stopped should be used as the initial state when the job > restarts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11707) StreamCorruptedException if authentication is enabled
[ https://issues.apache.org/jira/browse/SPARK-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-11707. --- Resolution: Cannot Reproduce > StreamCorruptedException if authentication is enabled > - > > Key: SPARK-11707 > URL: https://issues.apache.org/jira/browse/SPARK-11707 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Jacek Lewandowski > > When authentication (and encryption) is enabled (at least in standalone > mode), the following code (in Spark shell): > {code} > sc.makeRDD(1 to 10, 10).map(x => x*x).map(_.toString).reduce(_ + _) > {code} > finishes with exception: > {noformat} > [Stage 0:> (0 + 8) / > 10]15/11/12 20:36:29 ERROR TransportRequestHandler: Error while invoking > RpcHandler#receive() on RPC id 5750598674048943239 > java.io.StreamCorruptedException: invalid type code: 30 > at > java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2508) > at > java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2543) > at > java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2702) > at java.io.ObjectInputStream.read(ObjectInputStream.java:865) > at > java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) > at > org.apache.spark.util.SerializableBuffer$$anonfun$readObject$1.apply(SerializableBuffer.scala:38) > at > org.apache.spark.util.SerializableBuffer$$anonfun$readObject$1.apply(SerializableBuffer.scala:32) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1186) > at > org.apache.spark.util.SerializableBuffer.readObject(SerializableBuffer.scala:32) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:109) > at > org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:248) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at > org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:296) > at > org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:247) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:448) > at > org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:76) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:122) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:94) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
[jira] [Updated] (SPARK-12284) Output UnsafeRow from window function
[ https://issues.apache.org/jira/browse/SPARK-12284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12284: -- Component/s: SQL > Output UnsafeRow from window function > - > > Key: SPARK-12284 > URL: https://issues.apache.org/jira/browse/SPARK-12284 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12295) Manage the memory used by window function
[ https://issues.apache.org/jira/browse/SPARK-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12295: -- Component/s: SQL > Manage the memory used by window function > - > > Key: SPARK-12295 > URL: https://issues.apache.org/jira/browse/SPARK-12295 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > > The buffered rows for a given frame should use UnsafeRow, and stored as pages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12289) Support UnsafeRow in TakeOrderedAndProject/Limit
[ https://issues.apache.org/jira/browse/SPARK-12289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12289: -- Component/s: SQL > Support UnsafeRow in TakeOrderedAndProject/Limit > > > Key: SPARK-12289 > URL: https://issues.apache.org/jira/browse/SPARK-12289 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12287) Support UnsafeRow in MapPartitions/MapGroups/CoGroup
[ https://issues.apache.org/jira/browse/SPARK-12287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12287: -- Component/s: SQL > Support UnsafeRow in MapPartitions/MapGroups/CoGroup > > > Key: SPARK-12287 > URL: https://issues.apache.org/jira/browse/SPARK-12287 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12291) Support UnsafeRow in BroadcastLeftSemiJoinHash
[ https://issues.apache.org/jira/browse/SPARK-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12291: -- Component/s: SQL > Support UnsafeRow in BroadcastLeftSemiJoinHash > -- > > Key: SPARK-12291 > URL: https://issues.apache.org/jira/browse/SPARK-12291 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect
[ https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12288: -- Component/s: SQL > Support UnsafeRow in Coalesce/Except/Intersect > -- > > Key: SPARK-12288 > URL: https://issues.apache.org/jira/browse/SPARK-12288 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12293) Support UnsafeRow in LocalTableScan
[ https://issues.apache.org/jira/browse/SPARK-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12293: -- Component/s: SQL > Support UnsafeRow in LocalTableScan > --- > > Key: SPARK-12293 > URL: https://issues.apache.org/jira/browse/SPARK-12293 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12290) Change the default value in SparkPlan
[ https://issues.apache.org/jira/browse/SPARK-12290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12290: -- Component/s: SQL > Change the default value in SparkPlan > - > > Key: SPARK-12290 > URL: https://issues.apache.org/jira/browse/SPARK-12290 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > > supportUnsafeRows = true > supportSafeRows = false // > outputUnsafeRows = true -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12292) Support UnsafeRow in Generate
[ https://issues.apache.org/jira/browse/SPARK-12292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12292: -- Component/s: SQL > Support UnsafeRow in Generate > - > > Key: SPARK-12292 > URL: https://issues.apache.org/jira/browse/SPARK-12292 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12294) Support UnsafeRow in HiveTableScan
[ https://issues.apache.org/jira/browse/SPARK-12294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12294: -- Component/s: SQL > Support UnsafeRow in HiveTableScan > -- > > Key: SPARK-12294 > URL: https://issues.apache.org/jira/browse/SPARK-12294 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12310) Add write.json and write.parquet for SparkR
Yanbo Liang created SPARK-12310: --- Summary: Add write.json and write.parquet for SparkR Key: SPARK-12310 URL: https://issues.apache.org/jira/browse/SPARK-12310 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yanbo Liang Add write.json and write.parquet for SparkR -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org