[jira] [Updated] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor
[ https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-42090: --- Description: Previously a boolean variable, saslTimeoutSeen, was used in RetryingBlockTransferor. However, the boolean variable wouldn't cover the following scenario: 1. SaslTimeoutException 2. IOException 3. SaslTimeoutException 4. IOException Even though IOException at #2 is retried (resulting in increment of retryCount), the retryCount would be cleared at step #4. Since the intention of saslTimeoutSeen is to undo the increment due to retrying SaslTimeoutException, we should keep a counter for SaslTimeoutException retries and subtract the value of this counter from retryCount. was: Previously a boolean variable, saslTimeoutSeen, was used. However, the boolean variable wouldn't cover the following scenario: 1. SaslTimeoutException 2. IOException 3. SaslTimeoutException 4. IOException Even though IOException at #2 is retried (resulting in increment of retryCount), the retryCount would be cleared at step #4. Since the intention of saslTimeoutSeen is to undo the increment due to retrying SaslTimeoutException, we should keep a counter for SaslTimeoutException retries and subtract the value of this counter from retryCount. > Introduce sasl retry count in RetryingBlockTransferor > - > > Key: SPARK-42090 > URL: https://issues.apache.org/jira/browse/SPARK-42090 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Ted Yu >Priority: Major > > Previously a boolean variable, saslTimeoutSeen, was used in > RetryingBlockTransferor. However, the boolean variable wouldn't cover the > following scenario: > 1. SaslTimeoutException > 2. IOException > 3. SaslTimeoutException > 4. IOException > Even though IOException at #2 is retried (resulting in increment of > retryCount), the retryCount would be cleared at step #4. > Since the intention of saslTimeoutSeen is to undo the increment due to > retrying SaslTimeoutException, we should keep a counter for > SaslTimeoutException retries and subtract the value of this counter from > retryCount. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor
Ted Yu created SPARK-42090: -- Summary: Introduce sasl retry count in RetryingBlockTransferor Key: SPARK-42090 URL: https://issues.apache.org/jira/browse/SPARK-42090 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Reporter: Ted Yu Previously a boolean variable, saslTimeoutSeen, was used. However, the boolean variable wouldn't cover the following scenario: 1. SaslTimeoutException 2. IOException 3. SaslTimeoutException 4. IOException Even though IOException at #2 is retried (resulting in increment of retryCount), the retryCount would be cleared at step #4. Since the intention of saslTimeoutSeen is to undo the increment due to retrying SaslTimeoutException, we should keep a counter for SaslTimeoutException retries and subtract the value of this counter from retryCount. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41853) Use Map in place of SortedMap for ErrorClassesJsonReader
Ted Yu created SPARK-41853: -- Summary: Use Map in place of SortedMap for ErrorClassesJsonReader Key: SPARK-41853 URL: https://issues.apache.org/jira/browse/SPARK-41853 Project: Spark Issue Type: Task Components: Spark Core Affects Versions: 3.2.3 Reporter: Ted Yu The use of SortedMap in ErrorClassesJsonReader was mostly for making tests easier to write. This PR replaces SortedMap with Map since SortedMap is slower compared to Map. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41705) Move generate_protos.sh to dev/
Ted Yu created SPARK-41705: -- Summary: Move generate_protos.sh to dev/ Key: SPARK-41705 URL: https://issues.apache.org/jira/browse/SPARK-41705 Project: Spark Issue Type: Task Components: Connect Affects Versions: 3.4.0 Reporter: Ted Yu connector/connect/dev only contains one script. Moving generate_protos.sh to dev follows practice for other scripts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41419) [K8S] Decrement PVC_COUNTER when the pod deletion happens
Ted Yu created SPARK-41419: -- Summary: [K8S] Decrement PVC_COUNTER when the pod deletion happens Key: SPARK-41419 URL: https://issues.apache.org/jira/browse/SPARK-41419 Project: Spark Issue Type: Task Components: Kubernetes Affects Versions: 3.4.0 Reporter: Ted Yu commit cc55de3 introduced PVC_COUNTER to track outstanding number of PVCs. PVC_COUNTER should only be decremented when the pod deletion happens (in response to error). If the PVC isn't created successfully (where PVC_COUNTER isn't incremented) (possibly due to execution not reaching resource(pvc).create() call), we shouldn't decrement the counter. variable `success` tracks the progress of PVC creation: value 0 means PVC is not created. value 1 means PVC has been created. value 2 means PVC has been created but due to subsequent error, the pod is deleted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41274) Bump Kubernetes Client Version to 6.2.0
[ https://issues.apache.org/jira/browse/SPARK-41274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-41274. Resolution: Duplicate This is dup of commit 02a2242a45062755bf7e20805958d5bdf1f5ed74 > Bump Kubernetes Client Version to 6.2.0 > --- > > Key: SPARK-41274 > URL: https://issues.apache.org/jira/browse/SPARK-41274 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Ted Yu >Priority: Major > > Bump Kubernetes Client Version to 6.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41274) Bump Kubernetes Client Version to 6.2.0
Ted Yu created SPARK-41274: -- Summary: Bump Kubernetes Client Version to 6.2.0 Key: SPARK-41274 URL: https://issues.apache.org/jira/browse/SPARK-41274 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.4.0 Reporter: Ted Yu Bump Kubernetes Client Version to 6.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41197) Upgrade Kafka version to 3.3 release
Ted Yu created SPARK-41197: -- Summary: Upgrade Kafka version to 3.3 release Key: SPARK-41197 URL: https://issues.apache.org/jira/browse/SPARK-41197 Project: Spark Issue Type: Improvement Components: Java API Affects Versions: 3.3.1 Reporter: Ted Yu Kafka 3.3 has been released. This issue upgrades Kafka dependency to 3.3 release. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40508) Treat unknown partitioning as UnknownPartitioning
[ https://issues.apache.org/jira/browse/SPARK-40508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-40508: --- Description: When running spark application against spark 3.3, I see the following : {code} java.lang.IllegalArgumentException: Unsupported data source V2 partitioning type: CustomPartitioning at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:46) at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:34) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) {code} The CustomPartitioning works fine with Spark 3.2.1 This PR proposes to relax the code and treat all unknown partitioning the same way as that for UnknownPartitioning. was: When running spark application against spark 3.3, I see the following : ``` java.lang.IllegalArgumentException: Unsupported data source V2 partitioning type: CustomPartitioning at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:46) at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:34) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) ``` The CustomPartitioning works fine with Spark 3.2.1 This PR proposes to relax the code and treat all unknown partitioning the same way as that for UnknownPartitioning. > Treat unknown partitioning as UnknownPartitioning > - > > Key: SPARK-40508 > URL: https://issues.apache.org/jira/browse/SPARK-40508 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Ted Yu >Priority: Major > > When running spark application against spark 3.3, I see the following : > {code} > java.lang.IllegalArgumentException: Unsupported data source V2 partitioning > type: CustomPartitioning > at > org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:46) > at > org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:34) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) > {code} > The CustomPartitioning works fine with Spark 3.2.1 > This PR proposes to relax the code and treat all unknown partitioning the > same way as that for UnknownPartitioning. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40508) Treat unknown partitioning as UnknownPartitioning
Ted Yu created SPARK-40508: -- Summary: Treat unknown partitioning as UnknownPartitioning Key: SPARK-40508 URL: https://issues.apache.org/jira/browse/SPARK-40508 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.3.0 Reporter: Ted Yu When running spark application against spark 3.3, I see the following : ``` java.lang.IllegalArgumentException: Unsupported data source V2 partitioning type: CustomPartitioning at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:46) at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:34) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) ``` The CustomPartitioning works fine with Spark 3.2.1 This PR proposes to relax the code and treat all unknown partitioning the same way as that for UnknownPartitioning. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39769) Rename trait Unevaluable
Ted Yu created SPARK-39769: -- Summary: Rename trait Unevaluable Key: SPARK-39769 URL: https://issues.apache.org/jira/browse/SPARK-39769 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.3.0 Reporter: Ted Yu I came upon `trait Unevaluable` which is defined in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala Unevaluable is not a word. There are `valuable`, `invaluable` but I have never seen Unevaluable. This issue renames the trait to Unevaluatable -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38998) Call to MetricsSystem#getServletHandlers() may take place before MetricsSystem becomes running
[ https://issues.apache.org/jira/browse/SPARK-38998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-38998: --- Description: Sometimes the following exception is observed: {code} java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem at scala.Predef$.require(Predef.scala:281) at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92) at org.apache.spark.SparkContext.(SparkContext.scala:597) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) {code} It seems somehow the MetricsSystem was stopped in between start() and getServletHandlers() calls. SparkContext should check the MetricsSystem becomes running before calling getServletHandlers() was: Sometimes the following exception is observed: {code} java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem at scala.Predef$.require(Predef.scala:281) at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92) at org.apache.spark.SparkContext.(SparkContext.scala:597) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) {code} It seems SparkContext should wait till the MetricsSystem becomes running before calling getServletHandlers() > Call to MetricsSystem#getServletHandlers() may take place before > MetricsSystem becomes running > -- > > Key: SPARK-38998 > URL: https://issues.apache.org/jira/browse/SPARK-38998 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Ted Yu >Priority: Major > > Sometimes the following exception is observed: > {code} > java.lang.IllegalArgumentException: requirement failed: Can only call > getServletHandlers on a running MetricsSystem > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92) > at org.apache.spark.SparkContext.(SparkContext.scala:597) > at > org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > {code} > It seems somehow the MetricsSystem was stopped in between start() and > getServletHandlers() calls. > SparkContext should check the MetricsSystem becomes running before calling > getServletHandlers() -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38998) Call to MetricsSystem#getServletHandlers() may take place before MetricsSystem becomes running
[ https://issues.apache.org/jira/browse/SPARK-38998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-38998: --- Description: Sometimes the following exception is observed: {code} java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem at scala.Predef$.require(Predef.scala:281) at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92) at org.apache.spark.SparkContext.(SparkContext.scala:597) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) {code} It seems SparkContext should wait till the MetricsSystem becomes running before calling getServletHandlers() was: Sometimes the following exception is observed: ``` java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem at scala.Predef$.require(Predef.scala:281) at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92) at org.apache.spark.SparkContext.(SparkContext.scala:597) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ``` It seems SparkContext should wait till the MetricsSystem becomes running before calling getServletHandlers() > Call to MetricsSystem#getServletHandlers() may take place before > MetricsSystem becomes running > -- > > Key: SPARK-38998 > URL: https://issues.apache.org/jira/browse/SPARK-38998 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Ted Yu >Priority: Major > > Sometimes the following exception is observed: > {code} > java.lang.IllegalArgumentException: requirement failed: Can only call > getServletHandlers on a running MetricsSystem > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92) > at org.apache.spark.SparkContext.(SparkContext.scala:597) > at > org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > {code} > It seems SparkContext should wait till the MetricsSystem becomes running > before calling getServletHandlers() -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38998) Call to MetricsSystem#getServletHandlers() may take place before MetricsSystem becomes running
Ted Yu created SPARK-38998: -- Summary: Call to MetricsSystem#getServletHandlers() may take place before MetricsSystem becomes running Key: SPARK-38998 URL: https://issues.apache.org/jira/browse/SPARK-38998 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.2 Reporter: Ted Yu Sometimes the following exception is observed: ``` java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem at scala.Predef$.require(Predef.scala:281) at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92) at org.apache.spark.SparkContext.(SparkContext.scala:597) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ``` It seems SparkContext should wait till the MetricsSystem becomes running before calling getServletHandlers() -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34532) IntervalUtils.add() may result in 'long overflow'
[ https://issues.apache.org/jira/browse/SPARK-34532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290646#comment-17290646 ] Ted Yu commented on SPARK-34532: Included the test command and some more information in the description. You should see these errors when you run the command. > IntervalUtils.add() may result in 'long overflow' > - > > Key: SPARK-34532 > URL: https://issues.apache.org/jira/browse/SPARK-34532 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.2 >Reporter: Ted Yu >Priority: Major > > I noticed the following when running test suite: > build/sbt "sql/testOnly *SQLQueryTestSuite" > {code} > 19:10:17.977 ERROR org.apache.spark.scheduler.TaskSetManager: Task 1 in stage > 6416.0 failed 1 times; aborting job > [info] - postgreSQL/int4.sql (2 seconds, 543 milliseconds) > 19:10:20.994 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 > in stage 6476.0 (TID 7789) > java.lang.ArithmeticException: long overflow > at java.lang.Math.multiplyExact(Math.java:892) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > {code} > {code} > 19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 > in stage 14744.0 (TID 16705) > java.lang.ArithmeticException: long overflow > at java.lang.Math.addExact(Math.java:809) > at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105) > at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573) > at > org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97) > {code} > This likely was caused by the following line: > {code} > val microseconds = left.microseconds + right.microseconds > {code} > We should check whether the addition would produce overflow before adding. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34532) IntervalUtils.add() may result in 'long overflow'
[ https://issues.apache.org/jira/browse/SPARK-34532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-34532: --- Description: I noticed the following when running test suite: build/sbt "sql/testOnly *SQLQueryTestSuite" {code} 19:10:17.977 ERROR org.apache.spark.scheduler.TaskSetManager: Task 1 in stage 6416.0 failed 1 times; aborting job [info] - postgreSQL/int4.sql (2 seconds, 543 milliseconds) 19:10:20.994 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 in stage 6476.0 (TID 7789) java.lang.ArithmeticException: long overflow at java.lang.Math.multiplyExact(Math.java:892) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) {code} {code} 19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 14744.0 (TID 16705) java.lang.ArithmeticException: long overflow at java.lang.Math.addExact(Math.java:809) at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105) at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104) at org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97) {code} This likely was caused by the following line: {code} val microseconds = left.microseconds + right.microseconds {code} We should check whether the addition would produce overflow before adding. was: I noticed the following when running test suite: {code} 19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 14744.0 (TID 16705) java.lang.ArithmeticException: long overflow at java.lang.Math.addExact(Math.java:809) at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105) at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104) at org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97) {code} This likely was caused by the following line: {code} val microseconds = left.microseconds + right.microseconds {code} We should check whether the addition would produce overflow before adding. > IntervalUtils.add() may result in 'long overflow' > - > > Key: SPARK-34532 > URL: https://issues.apache.org/jira/browse/SPARK-34532 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.2 >Reporter: Ted Yu >Priority: Major > > I noticed the following when running test suite: > build/sbt "sql/testOnly *SQLQueryTestSuite" > {code} > 19:10:17.977 ERROR org.apache.spark.scheduler.TaskSetManager: Task 1 in stage > 6416.0 failed 1 times; aborting job > [info] - postgreSQL/int4.sql (2 seconds, 543 milliseconds) > 19:10:20.994 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 > in stage 6476.0 (TID 7789) > java.lang.ArithmeticException: long overflow > at java.lang.Math.multiplyExact(Math.java:892) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345) > at > or
[jira] [Created] (SPARK-34532) IntervalUtils.add() may result in 'long overflow'
Ted Yu created SPARK-34532: -- Summary: IntervalUtils.add() may result in 'long overflow' Key: SPARK-34532 URL: https://issues.apache.org/jira/browse/SPARK-34532 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.2 Reporter: Ted Yu I noticed the following when running test suite: {code} 19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 14744.0 (TID 16705) java.lang.ArithmeticException: long overflow at java.lang.Math.addExact(Math.java:809) at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105) at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104) at org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97) {code} This likely was caused by the following line: {code} val microseconds = left.microseconds + right.microseconds {code} We should check whether the addition would produce overflow before adding. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences
[ https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287403#comment-17287403 ] Ted Yu commented on SPARK-34476: The basic jsonb test is here: https://github.com/yugabyte/yugabyte-db/blob/master/java/yb-cql-4x/src/test/java/org/yb/loadtest/TestSpark3Jsonb.java I am working on adding get_json_string() function (via Spark extension) which is similar to get_json_object() but expands the last jsonb field using '->>' instead of '->'. > Duplicate referenceNames are given for ambiguousReferences > -- > > Key: SPARK-34476 > URL: https://issues.apache.org/jira/browse/SPARK-34476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Ted Yu >Priority: Major > > When running test with Spark extension that converts custom function to json > path expression, I saw the following in test output: > {code} > 2021-02-19 21:57:24,550 (Time-limited test) [INFO - > org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is > == Physical Plan == > org.apache.spark.sql.AnalysisException: Reference > 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: > mycatalog.test.person.phone->'key'->1->'m'->2->>'b', > mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8 > {code} > Please note the candidates following 'could be' are the same. > Here is the physical plan for a working query where phone is a jsonb column: > {code} > TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], > output=[id#6,address#7,key#0]) > +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0] >+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra > Scan: test.person > - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]] > - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b'] > {code} > The difference for the failed query is that it tries to use > {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as > part of filter). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences
[ https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-34476: --- Description: When running test with Spark extension that converts custom function to json path expression, I saw the following in test output: {code} 2021-02-19 21:57:24,550 (Time-limited test) [INFO - org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == Physical Plan == org.apache.spark.sql.AnalysisException: Reference 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: mycatalog.test.person.phone->'key'->1->'m'->2->>'b', mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8 {code} Please note the candidates following 'could be' are the same. Here is the physical plan for a working query where phone is a jsonb column: {code} TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], output=[id#6,address#7,key#0]) +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0] +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra Scan: test.person - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]] - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b'] {code} The difference for the failed query is that it tries to use {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as part of filter). was: When running test with Spark extension that converts custom function to json path expression, I saw the following in test output: {code} 2021-02-19 21:57:24,550 (Time-limited test) [INFO - org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == Physical Plan == org.apache.spark.sql.AnalysisException: Reference 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: mycatalog.test.person.phone->'key'->1->'m'->2->>'b', mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8 {code} Please note the candidates following 'could be' are the same. Here is the physical plan for a working query where phone is a jsonb column: {code} TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], output=[id#6,address#7,key#0]) +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0] +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra Scan: test.person - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]] - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b'] {code} The difference for the failed query is that it tries to use phone->'key'->1->'m'->2->>'b' in the projection (which works as part of filter). > Duplicate referenceNames are given for ambiguousReferences > -- > > Key: SPARK-34476 > URL: https://issues.apache.org/jira/browse/SPARK-34476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Ted Yu >Priority: Major > > When running test with Spark extension that converts custom function to json > path expression, I saw the following in test output: > {code} > 2021-02-19 21:57:24,550 (Time-limited test) [INFO - > org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is > == Physical Plan == > org.apache.spark.sql.AnalysisException: Reference > 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: > mycatalog.test.person.phone->'key'->1->'m'->2->>'b', > mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8 > {code} > Please note the candidates following 'could be' are the same. > Here is the physical plan for a working query where phone is a jsonb column: > {code} > TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], > output=[id#6,address#7,key#0]) > +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0] >+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra > Scan: test.person > - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]] > - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b'] > {code} > The difference for the failed query is that it tries to use > {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as > part of filter). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences
Ted Yu created SPARK-34476: -- Summary: Duplicate referenceNames are given for ambiguousReferences Key: SPARK-34476 URL: https://issues.apache.org/jira/browse/SPARK-34476 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0 Reporter: Ted Yu When running test with Spark extension that converts custom function to json path expression, I saw the following in test output: {code} 2021-02-19 21:57:24,550 (Time-limited test) [INFO - org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == Physical Plan == org.apache.spark.sql.AnalysisException: Reference 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: mycatalog.test.person.phone->'key'->1->'m'->2->>'b', mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8 {code} Please note the candidates following 'could be' are the same. Here is the physical plan for a working query where phone is a jsonb column: {code} TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], output=[id#6,address#7,key#0]) +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0] +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra Scan: test.person - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]] - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b'] {code} The difference for the failed query is that it tries to use phone->'key'->1->'m'->2->>'b' in the projection (which works as part of filter). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34017) Pass json column information via pruneColumns()
[ https://issues.apache.org/jira/browse/SPARK-34017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259423#comment-17259423 ] Ted Yu commented on SPARK-34017: For PushDownUtils#pruneColumns, I am experimenting with the following: {code} case r: SupportsPushDownRequiredColumns if SQLConf.get.nestedSchemaPruningEnabled => val JSONCapture = "get_json_object\\((.*), *(.*)\\)".r var jsonRootFields : ArrayBuffer[RootField] = ArrayBuffer() projects.map{ _.map{ f => f.toString match { case JSONCapture(column, field) => jsonRootFields += RootField(StructField(column, f.dataType, f.nullable), derivedFromAtt = false, prunedIfAnyChildAccessed = true) case _ => logDebug("else " + f) }}} val rootFields = SchemaPruning.identifyRootFields(projects, filters) ++ jsonRootFields {code} > Pass json column information via pruneColumns() > --- > > Key: SPARK-34017 > URL: https://issues.apache.org/jira/browse/SPARK-34017 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: Ted Yu >Priority: Major > > Currently PushDownUtils#pruneColumns only passes root fields to > SupportsPushDownRequiredColumns implementation(s). > {code} > 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - > org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema > projection List(id#33, address#34, phone#36, get_json_object(phone#36, > $.code) AS get_json_object(phone, $.code)#37) > 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - > org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema > StructType(StructField(id,IntegerType,false), > StructField(address,StringType,true), StructField(phone,StringType,true)) > {code} > The first line shows projections and the second line shows the pruned schema. > We can see that get_json_object(phone#36, $.code) is filtered. This > expression retrieves field 'code' from phone json column. > We should allow json column information to be passed via pruneColumns(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34017) Pass json column information via pruneColumns()
[ https://issues.apache.org/jira/browse/SPARK-34017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-34017: --- Description: Currently PushDownUtils#pruneColumns only passes root fields to SupportsPushDownRequiredColumns implementation(s). {code} 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) AS get_json_object(phone, $.code)#37) 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema StructType(StructField(id,IntegerType,false), StructField(address,StringType,true), StructField(phone,StringType,true)) {code} The first line shows projections and the second line shows the pruned schema. We can see that get_json_object(phone#36, $.code) is filtered. This expression retrieves field 'code' from phone json column. We should allow json column information to be passed via pruneColumns(). was: Currently PushDownUtils#pruneColumns only passes root fields to SupportsPushDownRequiredColumns implementation(s). 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) AS get_json_object(phone, $.code)#37) 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema StructType(StructField(id,IntegerType,false), StructField(address,StringType,true), StructField(phone,StringType,true)) The first line shows projections and the second line shows the pruned schema. We can see that get_json_object(phone#36, $.code) is filtered. This expression retrieves field 'code' from phone json column. We should allow json column information to be passed via pruneColumns(). > Pass json column information via pruneColumns() > --- > > Key: SPARK-34017 > URL: https://issues.apache.org/jira/browse/SPARK-34017 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: Ted Yu >Priority: Major > > Currently PushDownUtils#pruneColumns only passes root fields to > SupportsPushDownRequiredColumns implementation(s). > {code} > 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - > org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema > projection List(id#33, address#34, phone#36, get_json_object(phone#36, > $.code) AS get_json_object(phone, $.code)#37) > 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - > org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema > StructType(StructField(id,IntegerType,false), > StructField(address,StringType,true), StructField(phone,StringType,true)) > {code} > The first line shows projections and the second line shows the pruned schema. > We can see that get_json_object(phone#36, $.code) is filtered. This > expression retrieves field 'code' from phone json column. > We should allow json column information to be passed via pruneColumns(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34017) Pass json column information via pruneColumns()
Ted Yu created SPARK-34017: -- Summary: Pass json column information via pruneColumns() Key: SPARK-34017 URL: https://issues.apache.org/jira/browse/SPARK-34017 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.1 Reporter: Ted Yu Currently PushDownUtils#pruneColumns only passes root fields to SupportsPushDownRequiredColumns implementation(s). 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) AS get_json_object(phone, $.code)#37) 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema StructType(StructField(id,IntegerType,false), StructField(address,StringType,true), StructField(phone,StringType,true)) The first line shows projections and the second line shows the pruned schema. We can see that get_json_object(phone#36, $.code) is filtered. This expression retrieves field 'code' from phone json column. We should allow json column information to be passed via pruneColumns(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33997) Running bin/spark-sql gives NoSuchMethodError
[ https://issues.apache.org/jira/browse/SPARK-33997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-33997. Resolution: Cannot Reproduce Rebuilt Spark locally and the error was gone. > Running bin/spark-sql gives NoSuchMethodError > - > > Key: SPARK-33997 > URL: https://issues.apache.org/jira/browse/SPARK-33997 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Ted Yu >Priority: Major > > I ran 'mvn install -Phive -Phive-thriftserver -DskipTests' > Running bin/spark-sql gives the following error: > {code} > 21/01/05 00:06:06 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.spark.sql.internal.SharedState$.loadHiveConfFile$default$3()Lscala/collection/Map; > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:136) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:934) > {code} > Scala version 2.12.10 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33997) Running bin/spark-sql gives NoSuchMethodError
Ted Yu created SPARK-33997: -- Summary: Running bin/spark-sql gives NoSuchMethodError Key: SPARK-33997 URL: https://issues.apache.org/jira/browse/SPARK-33997 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1 Reporter: Ted Yu I ran 'mvn install -Phive -Phive-thriftserver -DskipTests' Running bin/spark-sql gives the following error: {code} 21/01/05 00:06:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.internal.SharedState$.loadHiveConfFile$default$3()Lscala/collection/Map; at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:136) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:934) {code} Scala version 2.12.10 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17257647#comment-17257647 ] Ted Yu commented on SPARK-33915: Here is sample code for capturing the column and fields in downstream PredicatePushDown.scala {code} private val JSONCapture = "`GetJsonObject\\((.*),(.*)\\)`".r private def transformGetJsonObject(p: Predicate): Predicate = { val eq = p.asInstanceOf[sources.EqualTo] eq.attribute match { case JSONCapture(column,field) => val colName = column.toString.split("#")(0) val names = field.toString.split("\\.").foldLeft(List[String]()){(z, n) => z :+ "->'"+n+"'" } sources.EqualTo(colName + names.slice(1, names.size).mkString(""), eq.value).asInstanceOf[Predicate] case _ => sources.EqualTo("foo", "bar").asInstanceOf[Predicate] } } {code} > Allow json expression to be pushable column > --- > > Key: SPARK-33915 > URL: https://issues.apache.org/jira/browse/SPARK-33915 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Ted Yu >Assignee: Apache Spark >Priority: Major > > Currently PushableColumnBase provides no support for json / jsonb expression. > Example of json expression: > {code} > get_json_object(phone, '$.code') = '1200' > {code} > If non-string literal is part of the expression, the presence of cast() would > complicate the situation. > Implication is that implementation of SupportsPushDownFilters doesn't have a > chance to perform pushdown even if third party DB engine supports json > expression pushdown. > This issue is for discussion and implementation of Spark core changes which > would allow json expression to be recognized as pushable column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-33915: --- Comment: was deleted (was: Opened https://github.com/apache/spark/pull/30984) > Allow json expression to be pushable column > --- > > Key: SPARK-33915 > URL: https://issues.apache.org/jira/browse/SPARK-33915 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Ted Yu >Assignee: Apache Spark >Priority: Major > > Currently PushableColumnBase provides no support for json / jsonb expression. > Example of json expression: > {code} > get_json_object(phone, '$.code') = '1200' > {code} > If non-string literal is part of the expression, the presence of cast() would > complicate the situation. > Implication is that implementation of SupportsPushDownFilters doesn't have a > chance to perform pushdown even if third party DB engine supports json > expression pushdown. > This issue is for discussion and implementation of Spark core changes which > would allow json expression to be recognized as pushable column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33953) readImages test fails due to NoClassDefFoundError for ImageTypeSpecifier
Ted Yu created SPARK-33953: -- Summary: readImages test fails due to NoClassDefFoundError for ImageTypeSpecifier Key: SPARK-33953 URL: https://issues.apache.org/jira/browse/SPARK-33953 Project: Spark Issue Type: Test Components: ML Affects Versions: 3.0.1 Reporter: Ted Yu >From https://github.com/apache/spark/pull/30984/checks?check_run_id=1630709203 >: ``` [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 21.0 failed 1 times, most recent failure: Lost task 1.0 in stage 21.0 (TID 20) (fv-az212-589.internal.cloudapp.net executor driver): java.lang.NoClassDefFoundError: Could not initialize class javax.imageio.ImageTypeSpecifier [info] at com.sun.imageio.plugins.png.PNGImageReader.getImageTypes(PNGImageReader.java:1531) [info] at com.sun.imageio.plugins.png.PNGImageReader.readImage(PNGImageReader.java:1318) ``` It seems some dependency is missing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17257025#comment-17257025 ] Ted Yu commented on SPARK-33915: Opened https://github.com/apache/spark/pull/30984 > Allow json expression to be pushable column > --- > > Key: SPARK-33915 > URL: https://issues.apache.org/jira/browse/SPARK-33915 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Ted Yu >Priority: Major > > Currently PushableColumnBase provides no support for json / jsonb expression. > Example of json expression: > {code} > get_json_object(phone, '$.code') = '1200' > {code} > If non-string literal is part of the expression, the presence of cast() would > complicate the situation. > Implication is that implementation of SupportsPushDownFilters doesn't have a > chance to perform pushdown even if third party DB engine supports json > expression pushdown. > This issue is for discussion and implementation of Spark core changes which > would allow json expression to be recognized as pushable column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255389#comment-17255389 ] Ted Yu edited comment on SPARK-33915 at 12/31/20, 3:16 PM: --- Here is the plan prior to predicate pushdown: {code} 2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0 +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS phone#33] +- Filter (get_json_object(phone#37, $.code) = 1200) +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person - Cassandra Filters: [] - Requested Columns: [id,address,phone] {code} Here is the plan with pushdown: {code} 2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0 +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS phone#33] +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person - Cassandra Filters: [[phone->'code' = ?, 1200]] - Requested Columns: [id,address,phone] {code} was (Author: yuzhih...@gmail.com): Here is the plan prior to predicate pushdown: {code} 2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0 +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS phone#33] +- Filter (get_json_object(phone#37, $.phone) = 1200) +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person - Cassandra Filters: [] - Requested Columns: [id,address,phone] {code} Here is the plan with pushdown: {code} 2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0 +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS phone#33] +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person - Cassandra Filters: [[phone->'phone' = ?, 1200]] - Requested Columns: [id,address,phone] {code} > Allow json expression to be pushable column > --- > > Key: SPARK-33915 > URL: https://issues.apache.org/jira/browse/SPARK-33915 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Ted Yu >Priority: Major > > Currently PushableColumnBase provides no support for json / jsonb expression. > Example of json expression: > {code} > get_json_object(phone, '$.code') = '1200' > {code} > If non-string literal is part of the expression, the presence of cast() would > complicate the situation. > Implication is that implementation of SupportsPushDownFilters doesn't have a > chance to perform pushdown even if third party DB engine supports json > expression pushdown. > This issue is for discussion and implementation of Spark core changes which > would allow json expression to be recognized as pushable column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255389#comment-17255389 ] Ted Yu edited comment on SPARK-33915 at 12/29/20, 6:51 PM: --- Here is the plan prior to predicate pushdown: {code} 2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0 +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS phone#33] +- Filter (get_json_object(phone#37, $.phone) = 1200) +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person - Cassandra Filters: [] - Requested Columns: [id,address,phone] {code} Here is the plan with pushdown: {code} 2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0 +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS phone#33] +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person - Cassandra Filters: [[phone->'phone' = ?, 1200]] - Requested Columns: [id,address,phone] {code} was (Author: yuzhih...@gmail.com): Here is the plan prior to predicate pushdown: {code} 2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0 +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS phone#33] +- Filter (get_json_object(phone#37, $.phone) = 1200) +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person - Cassandra Filters: [] - Requested Columns: [id,address,phone] {code} Here is the plan with pushdown: {code} 2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0 +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS phone#33] +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person - Cassandra Filters: [["`GetJsonObject(phone#37,$.phone)`" = ?, 1200]] - Requested Columns: [id,address,phone] {code} > Allow json expression to be pushable column > --- > > Key: SPARK-33915 > URL: https://issues.apache.org/jira/browse/SPARK-33915 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Ted Yu >Priority: Major > > Currently PushableColumnBase provides no support for json / jsonb expression. > Example of json expression: > {code} > get_json_object(phone, '$.code') = '1200' > {code} > If non-string literal is part of the expression, the presence of cast() would > complicate the situation. > Implication is that implementation of SupportsPushDownFilters doesn't have a > chance to perform pushdown even if third party DB engine supports json > expression pushdown. > This issue is for discussion and implementation of Spark core changes which > would allow json expression to be recognized as pushable column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255631#comment-17255631 ] Ted Yu commented on SPARK-33915: [~codingcat] [~XuanYuan][~viirya][~Alex Herman] Can you provide your comment ? Thanks > Allow json expression to be pushable column > --- > > Key: SPARK-33915 > URL: https://issues.apache.org/jira/browse/SPARK-33915 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Ted Yu >Priority: Major > > Currently PushableColumnBase provides no support for json / jsonb expression. > Example of json expression: > {code} > get_json_object(phone, '$.code') = '1200' > {code} > If non-string literal is part of the expression, the presence of cast() would > complicate the situation. > Implication is that implementation of SupportsPushDownFilters doesn't have a > chance to perform pushdown even if third party DB engine supports json > expression pushdown. > This issue is for discussion and implementation of Spark core changes which > would allow json expression to be recognized as pushable column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255389#comment-17255389 ] Ted Yu commented on SPARK-33915: Here is the plan prior to predicate pushdown: {code} 2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0 +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS phone#33] +- Filter (get_json_object(phone#37, $.phone) = 1200) +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person - Cassandra Filters: [] - Requested Columns: [id,address,phone] {code} Here is the plan with pushdown: {code} 2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0 +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS phone#33] +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person - Cassandra Filters: [["`GetJsonObject(phone#37,$.phone)`" = ?, 1200]] - Requested Columns: [id,address,phone] {code} > Allow json expression to be pushable column > --- > > Key: SPARK-33915 > URL: https://issues.apache.org/jira/browse/SPARK-33915 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Ted Yu >Priority: Major > > Currently PushableColumnBase provides no support for json / jsonb expression. > Example of json expression: > {code} > get_json_object(phone, '$.code') = '1200' > {code} > If non-string literal is part of the expression, the presence of cast() would > complicate the situation. > Implication is that implementation of SupportsPushDownFilters doesn't have a > chance to perform pushdown even if third party DB engine supports json > expression pushdown. > This issue is for discussion and implementation of Spark core changes which > would allow json expression to be recognized as pushable column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254944#comment-17254944 ] Ted Yu edited comment on SPARK-33915 at 12/26/20, 4:14 AM: --- I have experimented with two patches which allow json expression to be pushable column (without the presence of cast). The first involves duplicating GetJsonObject as GetJsonString where eval() method returns UTF8String. FunctionRegistry.scala and functions.scala would have corresponding change. In PushableColumnBase class, new case for GetJsonString is added. Since code duplication is undesirable and case class cannot be directly subclassed, there is more work to be done in this direction. The second is simpler. The return type of GetJsonObject#eval is changed to UTF8String. I have run through catalyst / sql core unit tests which passed. In PushableColumnBase class, new case for GetJsonObject is added. In test output, I observed the following (for second patch, output for first patch is similar): {code} Post-Scan Filters: (get_json_object(phone#37, $.code) = 1200) {code} Comment on the two approaches (and other approach) is welcome. Below is snippet for second approach. {code} diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala index c22b68890a..8004ddd735 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala @@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression) @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String]) - override def eval(input: InternalRow): Any = { + override def eval(input: InternalRow): UTF8String = { val jsonStr = json.eval(input).asInstanceOf[UTF8String] if (jsonStr == null) { return null diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala index e4f001d61a..1cdc2642ba 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala @@ -723,6 +725,12 @@ abstract class PushableColumnBase { } case s: GetStructField if nestedPredicatePushdownEnabled => helper(s.child).map(_ :+ s.childSchema(s.ordinal).name) + case GetJsonObject(col, field) => +Some(Seq("GetJsonObject(" + col + "," + field + ")")) case _ => None } helper(e).map(_.quoted) {code} was (Author: yuzhih...@gmail.com): I have experimented with two patches which allow json expression to be pushable column (without the presence of cast). The first involves duplicating GetJsonObject as GetJsonString where eval() method returns UTF8String. FunctionRegistry.scala and functions.scala would have corresponding change. In PushableColumnBase class, new case for GetJsonString is added. Since code duplication is undesirable and case class cannot be directly subclassed, there is more work to be done in this direction. The second is simpler. The return type of GetJsonObject#eval is changed to UTF8String. I have run through catalyst / sql core unit tests which passed. In PushableColumnBase class, new case for GetJsonObject is added. In test output, I observed the following (for second patch, output for first patch is similar): {code} Post-Scan Filters: (get_json_object(phone#37, $.phone) = 1200) {code} Comment on the two approaches (and other approach) is welcome. Below is snippet for second approach. {code} diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala index c22b68890a..8004ddd735 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala @@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression) @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String]) - override def eval(input: InternalRow): Any = { + override def eval(input: InternalRow): UTF8String = { val jsonStr = json.eval(input).asInstanceOf[UTF8String] if (jsonStr == null) { return null diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.s
[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254944#comment-17254944 ] Ted Yu edited comment on SPARK-33915 at 12/26/20, 4:14 AM: --- I have experimented with two patches which allow json expression to be pushable column (without the presence of cast). The first involves duplicating GetJsonObject as GetJsonString where eval() method returns UTF8String. FunctionRegistry.scala and functions.scala would have corresponding change. In PushableColumnBase class, new case for GetJsonString is added. Since code duplication is undesirable and case class cannot be directly subclassed, there is more work to be done in this direction. The second is simpler. The return type of GetJsonObject#eval is changed to UTF8String. I have run through catalyst / sql core unit tests which passed. In PushableColumnBase class, new case for GetJsonObject is added. In test output, I observed the following (for second patch, output for first patch is similar): {code} Post-Scan Filters: (get_json_object(phone#37, $.phone) = 1200) {code} Comment on the two approaches (and other approach) is welcome. Below is snippet for second approach. {code} diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala index c22b68890a..8004ddd735 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala @@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression) @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String]) - override def eval(input: InternalRow): Any = { + override def eval(input: InternalRow): UTF8String = { val jsonStr = json.eval(input).asInstanceOf[UTF8String] if (jsonStr == null) { return null diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala index e4f001d61a..1cdc2642ba 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala @@ -723,6 +725,12 @@ abstract class PushableColumnBase { } case s: GetStructField if nestedPredicatePushdownEnabled => helper(s.child).map(_ :+ s.childSchema(s.ordinal).name) + case GetJsonObject(col, field) => +Some(Seq("GetJsonObject(" + col + "," + field + ")")) case _ => None } helper(e).map(_.quoted) {code} was (Author: yuzhih...@gmail.com): I have experimented with two patches which allow json expression to be pushable column (without the presence of cast). The first involves duplicating GetJsonObject as GetJsonString where eval() method returns UTF8String. FunctionRegistry.scala and functions.scala would have corresponding change. In PushableColumnBase class, new case for GetJsonString is added. Since code duplication is undesirable and case class cannot be directly subclassed, there is more work to be done in this direction. The second is simpler. The return type of GetJsonObject#eval is changed to UTF8String. I have run through catalyst / sql core unit tests which passed. In PushableColumnBase class, new case for GetJsonObject is added. In test output, I observed the following (for second patch, output for first patch is similar): Post-Scan Filters: (get_json_object(phone#37, $.phone) = 1200) Comment on the two approaches (and other approach) is welcome. Below is snippet for second approach. {code} diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala index c22b68890a..8004ddd735 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala @@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression) @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String]) - override def eval(input: InternalRow): Any = { + override def eval(input: InternalRow): UTF8String = { val jsonStr = json.eval(input).asInstanceOf[UTF8String] if (jsonStr == null) { return null diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala index
[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254944#comment-17254944 ] Ted Yu commented on SPARK-33915: I have experimented with two patches which allow json expression to be pushable column (without the presence of cast). The first involves duplicating GetJsonObject as GetJsonString where eval() method returns UTF8String. FunctionRegistry.scala and functions.scala would have corresponding change. In PushableColumnBase class, new case for GetJsonString is added. Since code duplication is undesirable and case class cannot be directly subclassed, there is more work to be done in this direction. The second is simpler. The return type of GetJsonObject#eval is changed to UTF8String. I have run through catalyst / sql core unit tests which passed. In PushableColumnBase class, new case for GetJsonObject is added. In test output, I observed the following (for second patch, output for first patch is similar): Post-Scan Filters: (get_json_object(phone#37, $.phone) = 1200) Comment on the two approaches (and other approach) is welcome. Below is snippet for second approach. {code} diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala index c22b68890a..8004ddd735 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala @@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression) @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String]) - override def eval(input: InternalRow): Any = { + override def eval(input: InternalRow): UTF8String = { val jsonStr = json.eval(input).asInstanceOf[UTF8String] if (jsonStr == null) { return null diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala index e4f001d61a..1cdc2642ba 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala @@ -723,6 +725,12 @@ abstract class PushableColumnBase { } case s: GetStructField if nestedPredicatePushdownEnabled => helper(s.child).map(_ :+ s.childSchema(s.ordinal).name) + case GetJsonObject(col, field) => +Some(Seq("GetJsonObject(" + col + "," + field + ")")) case _ => None } helper(e).map(_.quoted) {code} > Allow json expression to be pushable column > --- > > Key: SPARK-33915 > URL: https://issues.apache.org/jira/browse/SPARK-33915 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Ted Yu >Priority: Major > > Currently PushableColumnBase provides no support for json / jsonb expression. > Example of json expression: > {code} > get_json_object(phone, '$.code') = '1200' > {code} > If non-string literal is part of the expression, the presence of cast() would > complicate the situation. > Implication is that implementation of SupportsPushDownFilters doesn't have a > chance to perform pushdown even if third party DB engine supports json > expression pushdown. > This issue is for discussion and implementation of Spark core changes which > would allow json expression to be recognized as pushable column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-33915: --- Description: Currently PushableColumnBase provides no support for json / jsonb expression. Example of json expression: {code} get_json_object(phone, '$.code') = '1200' {code} If non-string literal is part of the expression, the presence of cast() would complicate the situation. Implication is that implementation of SupportsPushDownFilters doesn't have a chance to perform pushdown even if third party DB engine supports json expression pushdown. This issue is for discussion and implementation of Spark core changes which would allow json expression to be recognized as pushable column. was: Currently PushableColumnBase provides no support for json / jsonb expression. Example of json expression: get_json_object(phone, '$.code') = '1200' If non-string literal is part of the expression, the presence of cast() would complicate the situation. Implication is that implementation of SupportsPushDownFilters doesn't have a chance to perform pushdown even if third party DB engine supports json expression pushdown. This issue is for discussion and implementation of Spark core changes which would allow json expression to be recognized as pushable column. > Allow json expression to be pushable column > --- > > Key: SPARK-33915 > URL: https://issues.apache.org/jira/browse/SPARK-33915 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Ted Yu >Priority: Major > > Currently PushableColumnBase provides no support for json / jsonb expression. > Example of json expression: > {code} > get_json_object(phone, '$.code') = '1200' > {code} > If non-string literal is part of the expression, the presence of cast() would > complicate the situation. > Implication is that implementation of SupportsPushDownFilters doesn't have a > chance to perform pushdown even if third party DB engine supports json > expression pushdown. > This issue is for discussion and implementation of Spark core changes which > would allow json expression to be recognized as pushable column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33915) Allow json expression to be pushable column
Ted Yu created SPARK-33915: -- Summary: Allow json expression to be pushable column Key: SPARK-33915 URL: https://issues.apache.org/jira/browse/SPARK-33915 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.1 Reporter: Ted Yu Currently PushableColumnBase provides no support for json / jsonb expression. Example of json expression: get_json_object(phone, '$.code') = '1200' If non-string literal is part of the expression, the presence of cast() would complicate the situation. Implication is that implementation of SupportsPushDownFilters doesn't have a chance to perform pushdown even if third party DB engine supports json expression pushdown. This issue is for discussion and implementation of Spark core changes which would allow json expression to be recognized as pushable column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25954) Upgrade to Kafka 2.1.0
[ https://issues.apache.org/jira/browse/SPARK-25954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677152#comment-16677152 ] Ted Yu commented on SPARK-25954: Looking at Kafka thread, message from Satish indicated there may be another RC coming. > Upgrade to Kafka 2.1.0 > -- > > Key: SPARK-25954 > URL: https://issues.apache.org/jira/browse/SPARK-25954 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Kafka 2.1.0 RC0 is started. Since this includes official KAFKA-7264 JDK 11 > support, we had better use that. > - > https://lists.apache.org/thread.html/8288f0afdfed4d329f1a8338320b6e24e7684a0593b4bbd6f1b79101@%3Cdev.kafka.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25201) Synchronization performed on AtomicReference in LevelDB class
Ted Yu created SPARK-25201: -- Summary: Synchronization performed on AtomicReference in LevelDB class Key: SPARK-25201 URL: https://issues.apache.org/jira/browse/SPARK-25201 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 2.3.1 Reporter: Ted Yu Here is related code: {code} void closeIterator(LevelDBIterator it) throws IOException { synchronized (this._db) { {code} There are more than one occurrence of such usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23636) [SPARK 2.2] | Kafka Consumer | KafkaUtils.createRDD throws Exception - java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
[ https://issues.apache.org/jira/browse/SPARK-23636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527204#comment-16527204 ] Ted Yu commented on SPARK-23636: It seems in KafkaDataConsumer#close : {code} def close(): Unit = consumer.close() {code} The code should catch ConcurrentModificationException and try closing the consumer again. > [SPARK 2.2] | Kafka Consumer | KafkaUtils.createRDD throws Exception - > java.util.ConcurrentModificationException: KafkaConsumer is not safe for > multi-threaded access > - > > Key: SPARK-23636 > URL: https://issues.apache.org/jira/browse/SPARK-23636 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.1, 2.2.0 >Reporter: Deepak >Priority: Major > Labels: performance > > h2.  > h2. Summary >  > While using the KafkaUtils.createRDD API - we receive below listed error, > specifically when 1 executor connects to 1 kafka topic-partition, but with > more than 1 core & fetches an Array(OffsetRanges) >  > _I've tagged this issue to "Structured Streaming" - as I could not find a > more appropriate component_ >  > > h2. Error Faced > {noformat} > java.util.ConcurrentModificationException: KafkaConsumer is not safe for > multi-threaded access{noformat} >  Stack Trace > {noformat} > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 5 in stage 1.0 failed 4 times, most recent failure: Lost task 5.3 in > stage 1.0 (TID 17, host, executor 16): > java.util.ConcurrentModificationException: KafkaConsumer is not safe for > multi-threaded access > at > org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:1629) > at > org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:1528) > at > org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:1508) > at > org.apache.spark.streaming.kafka010.CachedKafkaConsumer.close(CachedKafkaConsumer.scala:59) > at > org.apache.spark.streaming.kafka010.CachedKafkaConsumer$.remove(CachedKafkaConsumer.scala:185) > at > org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.(KafkaRDD.scala:204) > at org.apache.spark.streaming.kafka010.KafkaRDD.compute(KafkaRDD.scala:181) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323){noformat} >  > > h2. Config Used to simulate the error > A session with : > * Executors - 1 > * Cores - 2 or More > * Kafka Topic - has only 1 partition > * While fetching - More than one Array of Offset Range , Example > {noformat} > Array(OffsetRange("kafka_topic",0,608954201,608954202), > OffsetRange("kafka_topic",0,608954202,608954203) > ){noformat} >  > > h2. Was this approach working before? >  > This was working in spark 1.6.2 > However, from spark 2.1 onwards - the approach throws exception >  > > h2. Why are we fetching from kafka as mentioned above. >  > This gives us the capability to establish a connection to Kafka Broker for > every spark executor's core, thus each core can fetch/process its own set of > messages based on the specified (offset ranges). >  >  > > h2. Sample Code >  > {quote}scala snippet - on versions spark 2.2.0 or 2.1.0 > // Bunch of imports > import kafka.serializer.\{DefaultDecoder, StringDecoder} > import org.apache.avro.generic.GenericRecord > import org.apache.kafka.clients.consumer.ConsumerRecord > import org.apache.kafka.common.serialization._ > import org.apache.spark.rdd.RDD > import org.apache.spark.sql.\{DataFrame, Row, SQLContext} > import org.apache.spark.sql.Row > import org.apache.spark.sql.hive.HiveContext > import org.apache.spark.sql.types.\{StringType, StructField, StructType} > import org.apache.spark.streaming.kafka010._ > import org.apache.spark.streaming.kafka010.KafkaUtils._ > {quote} > {quote}// This forces two connections - from a single executor - to > topic-partition . > // And with 2 cores assigned to 1 executor : each core has a task - pulling > respective offsets : OffsetRange("kafka_topic",0,1,2) & > OffsetRange("kafka_topic",0,2,3) > val parallelizedRanges = Array(OffsetRange("kafka_topic",0,1,2), // Fetching > sample 2 records > OffsetRange("kafka_topic",0,2,3) // Fetching sample 2 records > ) >  > // Initiate kafka properties > val kafkaParams1: java.util.Map[String, Object] = new java.util.HashMap() > // kafkaParams1.put("key","val") add all the parameters such as broker, > topic Not listing every property here. >  > // Create RDD > val rDDConsumerRec: RDD[ConsumerRecord[String, String]] = > createRDD[String, String](sparkContext > , kafkaParams1, parallelizedRanges, LocationStrategies.Prefe
[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499535#comment-16499535 ] Ted Yu commented on SPARK-18057: I have created a PR, shown above. Kafka 2.0.0 is used since that would be the release KIP-266 is integrated. If there is no objection, the JIRA title should be modified. > Update structured streaming kafka from 0.10.0.1 to 1.1.0 > > > Key: SPARK-18057 > URL: https://issues.apache.org/jira/browse/SPARK-18057 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Reporter: Cody Koeninger >Priority: Major > > There are a couple of relevant KIPs here, > https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497070#comment-16497070 ] Ted Yu commented on SPARK-18057: I tend to agree with Cody. Just wondering if other people would accept kafka-0-10-sql module referencing Kafka 2.0.0 release. > Update structured streaming kafka from 0.10.0.1 to 1.1.0 > > > Key: SPARK-18057 > URL: https://issues.apache.org/jira/browse/SPARK-18057 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Reporter: Cody Koeninger >Priority: Major > > There are a couple of relevant KIPs here, > https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495580#comment-16495580 ] Ted Yu commented on SPARK-18057: There are compilation errors in KafkaTestUtils.scala against Kafka 1.0.1 release. What's the preferred way forward: * use reflection in KafkaTestUtils.scala * create external/kafka-1-0-sql module Thanks > Update structured streaming kafka from 0.10.0.1 to 1.1.0 > > > Key: SPARK-18057 > URL: https://issues.apache.org/jira/browse/SPARK-18057 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Reporter: Cody Koeninger >Priority: Major > > There are a couple of relevant KIPs here, > https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21569) Internal Spark class needs to be kryo-registered
[ https://issues.apache.org/jira/browse/SPARK-21569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472295#comment-16472295 ] Ted Yu commented on SPARK-21569: What would be workaround ? Thanks > Internal Spark class needs to be kryo-registered > > > Key: SPARK-21569 > URL: https://issues.apache.org/jira/browse/SPARK-21569 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Ryan Williams >Priority: Major > > [Full repro here|https://github.com/ryan-williams/spark-bugs/tree/hf] > As of 2.2.0, {{saveAsNewAPIHadoopFile}} jobs fail (when > {{spark.kryo.registrationRequired=true}}) with: > {code} > java.lang.IllegalArgumentException: Class is not registered: > org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage > Note: To register this class use: > kryo.register(org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage.class); > at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:458) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79) > at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:488) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:593) > at > org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:315) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:383) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > This internal Spark class should be kryo-registered by Spark by default. > This was not a problem in 2.1.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23714) Add metrics for cached KafkaConsumer
[ https://issues.apache.org/jira/browse/SPARK-23714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416138#comment-16416138 ] Ted Yu commented on SPARK-23714: [~tdas]: What do you think ? Thanks > Add metrics for cached KafkaConsumer > > > Key: SPARK-23714 > URL: https://issues.apache.org/jira/browse/SPARK-23714 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Ted Yu >Priority: Major > > SPARK-23623 added KafkaDataConsumer to avoid concurrent use of cached > KafkaConsumer. > This JIRA is to add metrics for measuring the operations of the cache so that > users can gain insight into the caching solution. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23714) Add metrics for cached KafkaConsumer
[ https://issues.apache.org/jira/browse/SPARK-23714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403792#comment-16403792 ] Ted Yu commented on SPARK-23714: Ryan reminded me of the possibility of using the existing metrics system. I think that implies passing SparkEnv to KafkaDataConsumer > Add metrics for cached KafkaConsumer > > > Key: SPARK-23714 > URL: https://issues.apache.org/jira/browse/SPARK-23714 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Ted Yu >Priority: Major > > SPARK-23623 added KafkaDataConsumer to avoid concurrent use of cached > KafkaConsumer. > This JIRA is to add metrics for measuring the operations of the cache so that > users can gain insight into the caching solution. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23714) Add metrics for cached KafkaConsumer
[ https://issues.apache.org/jira/browse/SPARK-23714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403112#comment-16403112 ] Ted Yu commented on SPARK-23714: [~tdas] made the following suggestion: apache commons pool does expose metrics on idle/active counts. See https://commons.apache.org/proper/commons-pool/apidocs/org/apache/commons/pool2/impl/BaseGenericObjectPool.html > Add metrics for cached KafkaConsumer > > > Key: SPARK-23714 > URL: https://issues.apache.org/jira/browse/SPARK-23714 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Ted Yu >Priority: Major > > SPARK-23623 added KafkaDataConsumer to avoid concurrent use of cached > KafkaConsumer. > This JIRA is to add metrics for measuring the operations of the cache so that > users can gain insight into the caching solution. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23714) Add metrics for cached KafkaConsumer
Ted Yu created SPARK-23714: -- Summary: Add metrics for cached KafkaConsumer Key: SPARK-23714 URL: https://issues.apache.org/jira/browse/SPARK-23714 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 2.3.0 Reporter: Ted Yu SPARK-23623 added KafkaDataConsumer to avoid concurrent use of cached KafkaConsumer. This JIRA is to add metrics for measuring the operations of the cache so that users can gain insight into the caching solution. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23347) Introduce buffer between Java data stream and gzip stream
[ https://issues.apache.org/jira/browse/SPARK-23347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354939#comment-16354939 ] Ted Yu commented on SPARK-23347: {code} public final byte[] serialize(Object o) throws Exception { {code} when o is an integer, I assume. Looked at ObjectMapper.java source code where generator is involved. Need to dig some more. > Introduce buffer between Java data stream and gzip stream > - > > Key: SPARK-23347 > URL: https://issues.apache.org/jira/browse/SPARK-23347 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Ted Yu >Priority: Minor > > Currently GZIPOutputStream is used directly around ByteArrayOutputStream > e.g. from KVStoreSerializer : > {code} > ByteArrayOutputStream bytes = new ByteArrayOutputStream(); > GZIPOutputStream out = new GZIPOutputStream(bytes); > {code} > This seems inefficient. > GZIPOutputStream does not implement the write(byte) method. It only provides > a write(byte[], offset, len) method, which calls the corresponding JNI zlib > function. > BufferedOutputStream can be introduced wrapping GZIPOutputStream for better > performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23347) Introduce buffer between Java data stream and gzip stream
[ https://issues.apache.org/jira/browse/SPARK-23347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354907#comment-16354907 ] Ted Yu commented on SPARK-23347: See JDK 1.8 code: {code} class DeflaterOutputStream { public void write(int b) throws IOException { byte[] buf = new byte[1]; buf[0] = (byte)(b & 0xff); write(buf, 0, 1); } public void write(byte[] b, int off, int len) throws IOException { if (def.finished()) { throw new IOException("write beyond end of stream"); } if ((off | len | (off + len) | (b.length - (off + len))) < 0) { throw new IndexOutOfBoundsException(); } else if (len == 0) { return; } if (!def.finished()) { def.setInput(b, off, len); while (!def.needsInput()) { deflate(); } } } } class GZIPOutputStream extends DeflaterOutputStream { public synchronized void write(byte[] buf, int off, int len) throws IOException { super.write(buf, off, len); crc.update(buf, off, len); } } class Deflater { private native int deflateBytes(long addr, byte[] b, int off, int len, int flush); } class CRC32 { public void update(byte[] b, int off, int len) { if (b == null) { throw new NullPointerException(); } if (off < 0 || len < 0 || off > b.length - len) { throw new ArrayIndexOutOfBoundsException(); } crc = updateBytes(crc, b, off, len); } private native static int updateBytes(int crc, byte[] b, int off, int len); } {code} For each data byte, the code above has to allocate 1 single byte array, acquire several locks, call two native JNI methods (Deflater.deflateBytes and CRC32.updateBytes). > Introduce buffer between Java data stream and gzip stream > - > > Key: SPARK-23347 > URL: https://issues.apache.org/jira/browse/SPARK-23347 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Ted Yu >Priority: Minor > > Currently GZIPOutputStream is used directly around ByteArrayOutputStream > e.g. from KVStoreSerializer : > {code} > ByteArrayOutputStream bytes = new ByteArrayOutputStream(); > GZIPOutputStream out = new GZIPOutputStream(bytes); > {code} > This seems inefficient. > GZIPOutputStream does not implement the write(byte) method. It only provides > a write(byte[], offset, len) method, which calls the corresponding JNI zlib > function. > BufferedOutputStream can be introduced wrapping GZIPOutputStream for better > performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23347) Introduce buffer between Java data stream and gzip stream
Ted Yu created SPARK-23347: -- Summary: Introduce buffer between Java data stream and gzip stream Key: SPARK-23347 URL: https://issues.apache.org/jira/browse/SPARK-23347 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.2.0 Reporter: Ted Yu Currently GZIPOutputStream is used directly around ByteArrayOutputStream e.g. from KVStoreSerializer : {code} ByteArrayOutputStream bytes = new ByteArrayOutputStream(); GZIPOutputStream out = new GZIPOutputStream(bytes); {code} This seems inefficient. GZIPOutputStream does not implement the write(byte) method. It only provides a write(byte[], offset, len) method, which calls the corresponding JNI zlib function. BufferedOutputStream can be introduced wrapping GZIPOutputStream for better performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual
[ https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10561: --- Labels: tool (was: ) > Provide tooling for auto-generating Spark SQL reference manual > -- > > Key: SPARK-10561 > URL: https://issues.apache.org/jira/browse/SPARK-10561 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Reporter: Ted Yu > Labels: tool > > Here is the discussion thread: > http://search-hadoop.com/m/q3RTtcD20F1o62xE > Richard Hillegas made the following suggestion: > A machine-generated BNF, however, is easy to imagine. But perhaps not so easy > to implement. Spark's SQL grammar is implemented in Scala, extending the DSL > support provided by the Scala language. I am new to programming in Scala, so > I don't know whether the Scala ecosystem provides any good tools for > reverse-engineering a BNF from a class which extends > scala.util.parsing.combinator.syntactical.StandardTokenParsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10868) monotonicallyIncreasingId() supports offset for indexing
[ https://issues.apache.org/jira/browse/SPARK-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425607#comment-15425607 ] Ted Yu commented on SPARK-10868: bq. this makes sense What was the reasoning behind the initial assessment ? > monotonicallyIncreasingId() supports offset for indexing > > > Key: SPARK-10868 > URL: https://issues.apache.org/jira/browse/SPARK-10868 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.0 >Reporter: Martin Senne > > With SPARK-7135 and https://github.com/apache/spark/pull/5709 > `monotonicallyIncreasingID()` allows to create an index column with unique > ids. The indexing always starts at 0 (no offset). > *Feature wish* > Having a parameter `offset`, such that the function can be used as > {{monotonicallyIncreasingID( offset )}} > and indexing _starts at *offset* instead of 0_. > *Use-case* > Add rows to a DataFrame that is already written to a DB (via > _.write.jdbc(...)_). > In detail: > - A DataFrame *A* (containing an ID column) and having indices from 0 to 199 > in that column is existent in DB. > - New rows need to be added to *A*. This included > -- Creating a DataFrame *A'* with new rows, but without id column > -- Add the index column to *A'* - this time starting at *200*, as there are > already entries with id's from 0 to 199 (*here, monotonicallyInreasingID( 200 > ) is required.*) > -- union *A* and *A'* > -- store into DB -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10868) monotonicallyIncreasingId() supports offset for indexing
[ https://issues.apache.org/jira/browse/SPARK-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414164#comment-15414164 ] Ted Yu commented on SPARK-10868: I can pick up this one if Martin hasn't started working on it. > monotonicallyIncreasingId() supports offset for indexing > > > Key: SPARK-10868 > URL: https://issues.apache.org/jira/browse/SPARK-10868 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.0 >Reporter: Martin Senne > > With SPARK-7135 and https://github.com/apache/spark/pull/5709 > `monotonicallyIncreasingID()` allows to create an index column with unique > ids. The indexing always starts at 0 (no offset). > *Feature wish* > Having a parameter `offset`, such that the function can be used as > {{monotonicallyIncreasingID( offset )}} > and indexing _starts at *offset* instead of 0_. > *Use-case* > Add rows to a DataFrame that is already written to a DB (via > _.write.jdbc(...)_). > In detail: > - A DataFrame *A* (containing an ID column) and having indices from 0 to 199 > in that column is existent in DB. > - New rows need to be added to *A*. This included > -- Creating a DataFrame *A'* with new rows, but without id column > -- Add the index column to *A'* - this time starting at *200*, as there are > already entries with id's from 0 to 199 (*here, monotonicallyInreasingID( 200 > ) is required.*) > -- union *A* and *A'* > -- store into DB -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15273) YarnSparkHadoopUtil#getOutOfMemoryErrorArgument should respect OnOutOfMemoryError parameter given by user
Ted Yu created SPARK-15273: -- Summary: YarnSparkHadoopUtil#getOutOfMemoryErrorArgument should respect OnOutOfMemoryError parameter given by user Key: SPARK-15273 URL: https://issues.apache.org/jira/browse/SPARK-15273 Project: Spark Issue Type: Bug Reporter: Ted Yu As Nirav reported in this thread: http://search-hadoop.com/m/q3RTtdF3yNLMd7u YarnSparkHadoopUtil#getOutOfMemoryErrorArgument previously specified 'kill %p' unconditionally. We should respect the parameter given by user. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15003) Use ConcurrentHashMap in place of HashMap for NewAccumulator.originals
Ted Yu created SPARK-15003: -- Summary: Use ConcurrentHashMap in place of HashMap for NewAccumulator.originals Key: SPARK-15003 URL: https://issues.apache.org/jira/browse/SPARK-15003 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Ted Yu This issue proposes to use ConcurrentHashMap in place of HashMap for NewAccumulator.originals This should result in better performance. Also removed AccumulatorContext#accumIds -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14448) Improvements to ColumnVector
Ted Yu created SPARK-14448: -- Summary: Improvements to ColumnVector Key: SPARK-14448 URL: https://issues.apache.org/jira/browse/SPARK-14448 Project: Spark Issue Type: Improvement Components: SQL Reporter: Ted Yu Priority: Minor In this issue, two changes are proposed for ColumnVector : 1. ColumnVector should be declared as implementing AutoCloseable - it already has close() method 2. In OnHeapColumnVector#reserveInternal(), we only need to allocate new array when existing array is null or the length of existing array is shorter than the newCapacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13663) Upgrade Snappy Java to 1.1.2.1
Ted Yu created SPARK-13663: -- Summary: Upgrade Snappy Java to 1.1.2.1 Key: SPARK-13663 URL: https://issues.apache.org/jira/browse/SPARK-13663 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Ted Yu Priority: Minor The JVM memory leaky problem reported in https://github.com/xerial/snappy-java/issues/131 has been resolved. 1.1.2.1 was released on Jan 22nd. We should upgrade to this release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual
[ https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10561: --- Description: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. was: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. > Provide tooling for auto-generating Spark SQL reference manual > -- > > Key: SPARK-10561 > URL: https://issues.apache.org/jira/browse/SPARK-10561 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Reporter: Ted Yu > > Here is the discussion thread: > http://search-hadoop.com/m/q3RTtcD20F1o62xE > Richard Hillegas made the following suggestion: > A machine-generated BNF, however, is easy to imagine. But perhaps not so easy > to implement. Spark's SQL grammar is implemented in Scala, extending the DSL > support provided by the Scala language. I am new to programming in Scala, so > I don't know whether the Scala ecosystem provides any good tools for > reverse-engineering a BNF from a class which extends > scala.util.parsing.combinator.syntactical.StandardTokenParsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13180) Protect against SessionState being null when accessing HiveClientImpl#conf
[ https://issues.apache.org/jira/browse/SPARK-13180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137674#comment-15137674 ] Ted Yu commented on SPARK-13180: I wonder if we should provide better error message when NPE happens - the cause may be mixed dependencies. See last response on the thread. > Protect against SessionState being null when accessing HiveClientImpl#conf > -- > > Key: SPARK-13180 > URL: https://issues.apache.org/jira/browse/SPARK-13180 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ted Yu >Priority: Minor > Attachments: spark-13180-util.patch > > > See this thread http://search-hadoop.com/m/q3RTtFoTDi2HVCrM1 > {code} > java.lang.NullPointerException > at > org.apache.spark.sql.hive.client.ClientWrapper.conf(ClientWrapper.scala:205) > at > org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552) > at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551) > at > org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:538) > at > org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:537) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:537) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:457) > at > org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:457) > at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:456) > at org.apache.spark.sql.hive.HiveContext$$anon$3.(HiveContext.scala:473) > at > org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:473) > at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:472) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:133) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) > at > org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:442) > at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:223) > at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual
[ https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10561: --- Description: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. was: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. > Provide tooling for auto-generating Spark SQL reference manual > -- > > Key: SPARK-10561 > URL: https://issues.apache.org/jira/browse/SPARK-10561 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Reporter: Ted Yu > > Here is the discussion thread: > http://search-hadoop.com/m/q3RTtcD20F1o62xE > Richard Hillegas made the following suggestion: > A machine-generated BNF, however, is easy to imagine. But perhaps not so easy > to implement. Spark's SQL grammar is implemented in Scala, extending the DSL > support provided by the Scala language. I am new to programming in Scala, so > I don't know whether the Scala ecosystem provides any good tools for > reverse-engineering a BNF from a class which extends > scala.util.parsing.combinator.syntactical.StandardTokenParsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13120) Shade protobuf-java
[ https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136383#comment-15136383 ] Ted Yu commented on SPARK-13120: I don't think this JIRA is limited to the scenario from the thread initially cited. I have dropped that from description. > Shade protobuf-java > --- > > Key: SPARK-13120 > URL: https://issues.apache.org/jira/browse/SPARK-13120 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Ted Yu > > https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8 > PB2 and PB3 are wire compatible, but, protobuf-java is not compatible so > dependency will be a problem. > Shading protobuf-java would provide better experience for downstream projects. > This issue shades com.google.protobuf:protobuf-java as > org.spark-project.protobuf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13120) Shade protobuf-java
[ https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-13120: --- Description: https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8 PB2 and PB3 are wire compatible, but, protobuf-java is not compatible so dependency will be a problem. Shading protobuf-java would provide better experience for downstream projects. This issue shades com.google.protobuf:protobuf-java as org.spark-project.protobuf was: See this thread for background information: http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis This issue shades com.google.protobuf:protobuf-java as org.spark-project.protobuf > Shade protobuf-java > --- > > Key: SPARK-13120 > URL: https://issues.apache.org/jira/browse/SPARK-13120 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Ted Yu > > https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8 > PB2 and PB3 are wire compatible, but, protobuf-java is not compatible so > dependency will be a problem. > Shading protobuf-java would provide better experience for downstream projects. > This issue shades com.google.protobuf:protobuf-java as > org.spark-project.protobuf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13120) Shade protobuf-java
[ https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136320#comment-15136320 ] Ted Yu commented on SPARK-13120: I do see the advantage of shading protobuf-java which would make user experience better where wire-compatibility across different protobuf versions is provided. > Shade protobuf-java > --- > > Key: SPARK-13120 > URL: https://issues.apache.org/jira/browse/SPARK-13120 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Ted Yu > > See this thread for background information: > http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis > This issue shades com.google.protobuf:protobuf-java as > org.spark-project.protobuf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated
[ https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136006#comment-15136006 ] Ted Yu commented on SPARK-13171: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/134/ is green. Let's keep an eye on future builds. > Update promise & future to Promise and Future as the old ones are deprecated > > > Key: SPARK-13171 > URL: https://issues.apache.org/jira/browse/SPARK-13171 > Project: Spark > Issue Type: Sub-task >Reporter: holdenk >Assignee: Jakob Odersky >Priority: Trivial > Fix For: 2.0.0 > > > We use the promise and future functions on the concurrent object, both of > which have been deprecated in 2.11 . The full traits are present in Scala > 2.10 as well so this should be a safe migration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated
[ https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136004#comment-15136004 ] Ted Yu commented on SPARK-13171: No, I have not. bq. Could it be that Hive does some funky reflection Could be. > Update promise & future to Promise and Future as the old ones are deprecated > > > Key: SPARK-13171 > URL: https://issues.apache.org/jira/browse/SPARK-13171 > Project: Spark > Issue Type: Sub-task >Reporter: holdenk >Assignee: Jakob Odersky >Priority: Trivial > Fix For: 2.0.0 > > > We use the promise and future functions on the concurrent object, both of > which have been deprecated in 2.11 . The full traits are present in Scala > 2.10 as well so this should be a safe migration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated
[ https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135972#comment-15135972 ] Ted Yu commented on SPARK-13171: The above Jenkins jobs were running Scala 2.11 > Update promise & future to Promise and Future as the old ones are deprecated > > > Key: SPARK-13171 > URL: https://issues.apache.org/jira/browse/SPARK-13171 > Project: Spark > Issue Type: Sub-task >Reporter: holdenk >Assignee: Jakob Odersky >Priority: Trivial > Fix For: 2.0.0 > > > We use the promise and future functions on the concurrent object, both of > which have been deprecated in 2.11 . The full traits are present in Scala > 2.10 as well so this should be a safe migration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated
[ https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135784#comment-15135784 ] Ted Yu commented on SPARK-13171: Turns out not that trivial. For build against hadoop 2.4 , some test timed out: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.4/lastCompletedBuild/testReport/org.apache.spark.sql.hive/HiveSparkSubmitSuite/SPARK_8020__set_sql_conf_in_spark_conf/ For build against hadoop 2.7, Jenkins job timed out. > Update promise & future to Promise and Future as the old ones are deprecated > > > Key: SPARK-13171 > URL: https://issues.apache.org/jira/browse/SPARK-13171 > Project: Spark > Issue Type: Sub-task >Reporter: holdenk >Assignee: Jakob Odersky >Priority: Trivial > Fix For: 2.0.0 > > > We use the promise and future functions on the concurrent object, both of > which have been deprecated in 2.11 . The full traits are present in Scala > 2.10 as well so this should be a safe migration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13180) Protect against SessionState being null when accessing HiveClientImpl#conf
[ https://issues.apache.org/jira/browse/SPARK-13180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-13180: --- Attachment: spark-13180-util.patch Patch with HiveConfUtil.scala For record. > Protect against SessionState being null when accessing HiveClientImpl#conf > -- > > Key: SPARK-13180 > URL: https://issues.apache.org/jira/browse/SPARK-13180 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > Attachments: spark-13180-util.patch > > > See this thread http://search-hadoop.com/m/q3RTtFoTDi2HVCrM1 > {code} > java.lang.NullPointerException > at > org.apache.spark.sql.hive.client.ClientWrapper.conf(ClientWrapper.scala:205) > at > org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552) > at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551) > at > org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:538) > at > org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:537) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:537) > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) > at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) > at org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:457) > at > org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:457) > at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:456) > at org.apache.spark.sql.hive.HiveContext$$anon$3.(HiveContext.scala:473) > at > org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:473) > at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:472) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:133) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) > at > org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:442) > at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:223) > at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13199) Upgrade apache httpclient version to the latest 4.5 for security
[ https://issues.apache.org/jira/browse/SPARK-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134022#comment-15134022 ] Ted Yu commented on SPARK-13199: This task should be done when there is a Hadoop release that upgrades httpclient to 4.5+ I logged this as Bug since it deals with CVEs. See HADOOP-12767. > Upgrade apache httpclient version to the latest 4.5 for security > > > Key: SPARK-13199 > URL: https://issues.apache.org/jira/browse/SPARK-13199 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Ted Yu >Priority: Minor > > Various SSL security fixes are needed. > See: CVE-2012-6153, CVE-2011-4461, CVE-2014-3577, CVE-2015-5262. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13204) Replace use of mutable.SynchronizedMap with ConcurrentHashMap
[ https://issues.apache.org/jira/browse/SPARK-13204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134006#comment-15134006 ] Ted Yu commented on SPARK-13204: Just noticed that parent JIRA was marked trivial. Should have done the search first. > Replace use of mutable.SynchronizedMap with ConcurrentHashMap > - > > Key: SPARK-13204 > URL: https://issues.apache.org/jira/browse/SPARK-13204 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Ted Yu >Priority: Trivial > > From > http://www.scala-lang.org/api/2.11.1/index.html#scala.collection.mutable.SynchronizedMap > : > Synchronization via traits is deprecated as it is inherently unreliable. > Consider java.util.concurrent.ConcurrentHashMap as an alternative. > This issue is to replace the use of mutable.SynchronizedMap and add > scalastyle rule banning mutable.SynchronizedMap -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13204) Replace use of mutable.SynchronizedMap with ConcurrentHashMap
Ted Yu created SPARK-13204: -- Summary: Replace use of mutable.SynchronizedMap with ConcurrentHashMap Key: SPARK-13204 URL: https://issues.apache.org/jira/browse/SPARK-13204 Project: Spark Issue Type: Bug Reporter: Ted Yu >From >http://www.scala-lang.org/api/2.11.1/index.html#scala.collection.mutable.SynchronizedMap > : Synchronization via traits is deprecated as it is inherently unreliable. Consider java.util.concurrent.ConcurrentHashMap as an alternative. This issue is to replace the use of mutable.SynchronizedMap and add scalastyle rule banning mutable.SynchronizedMap -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13203) Add scalastyle rule banning use of mutable.SynchronizedBuffer
Ted Yu created SPARK-13203: -- Summary: Add scalastyle rule banning use of mutable.SynchronizedBuffer Key: SPARK-13203 URL: https://issues.apache.org/jira/browse/SPARK-13203 Project: Spark Issue Type: Improvement Components: Build Reporter: Ted Yu Priority: Minor >From SPARK-13164 : Building with scala 2.11 results in the warning trait SynchronizedBuffer in package mutable is deprecated: Synchronization via traits is deprecated as it is inherently unreliable. Consider java.util.concurrent.ConcurrentLinkedQueue as an alternative. This issue adds scalastyle rule banning the use of mutable.SynchronizedBuffer -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13199) Upgrade apache httpclient version to the latest 4.5 for security
Ted Yu created SPARK-13199: -- Summary: Upgrade apache httpclient version to the latest 4.5 for security Key: SPARK-13199 URL: https://issues.apache.org/jira/browse/SPARK-13199 Project: Spark Issue Type: Bug Components: Build Reporter: Ted Yu Various SSL security fixes are needed. See: CVE-2012-6153, CVE-2011-4461, CVE-2014-3577, CVE-2015-5262. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13180) Protect against SessionState being null when accessing HiveClientImpl#conf
Ted Yu created SPARK-13180: -- Summary: Protect against SessionState being null when accessing HiveClientImpl#conf Key: SPARK-13180 URL: https://issues.apache.org/jira/browse/SPARK-13180 Project: Spark Issue Type: Bug Reporter: Ted Yu Priority: Minor See this thread http://search-hadoop.com/m/q3RTtFoTDi2HVCrM1 {code} java.lang.NullPointerException at org.apache.spark.sql.hive.client.ClientWrapper.conf(ClientWrapper.scala:205) at org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552) at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551) at org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:538) at org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:537) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:537) at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) at org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:457) at org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:457) at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:456) at org.apache.spark.sql.hive.HiveContext$$anon$3.(HiveContext.scala:473) at org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:473) at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:472) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) at org.apache.spark.sql.DataFrame.(DataFrame.scala:133) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) at org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:442) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13120) Shade protobuf-java
[ https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128759#comment-15128759 ] Ted Yu commented on SPARK-13120: https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8 PB2 and PB3 are wire compatible, but, protobuf-java is not compatible so dependency will be a problem. Shading protobuf-java would provide better experience for downstream projects. > Shade protobuf-java > --- > > Key: SPARK-13120 > URL: https://issues.apache.org/jira/browse/SPARK-13120 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Ted Yu > > See this thread for background information: > http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis > This issue shades com.google.protobuf:protobuf-java as > org.spark-project.protobuf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13084) Utilize @SerialVersionUID to avoid local class incompatibility
[ https://issues.apache.org/jira/browse/SPARK-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-13084. Resolution: Won't Fix > Utilize @SerialVersionUID to avoid local class incompatibility > -- > > Key: SPARK-13084 > URL: https://issues.apache.org/jira/browse/SPARK-13084 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Ted Yu > > Here is related thread: > http://search-hadoop.com/m/q3RTtSjjdT1BJ4Jr/local+class+incompatible&subj=local+class+incompatible+stream+classdesc+serialVersionUID > RDD extends Serializable but doesn't have @SerialVersionUID() annotation. > Adding @SerialVersionUID would overcome local class incompatibility across > minor releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13120) Shade protobuf-java
Ted Yu created SPARK-13120: -- Summary: Shade protobuf-java Key: SPARK-13120 URL: https://issues.apache.org/jira/browse/SPARK-13120 Project: Spark Issue Type: Improvement Components: Build Reporter: Ted Yu See this thread for background information: http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis This issue shades com.google.protobuf:protobuf-java as org.spark-project.protobuf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13084) Utilize @SerialVersionUID to avoid local class incompatibility
[ https://issues.apache.org/jira/browse/SPARK-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-13084: --- Component/s: Spark Core > Utilize @SerialVersionUID to avoid local class incompatibility > -- > > Key: SPARK-13084 > URL: https://issues.apache.org/jira/browse/SPARK-13084 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Ted Yu > > Here is related thread: > http://search-hadoop.com/m/q3RTtSjjdT1BJ4Jr/local+class+incompatible&subj=local+class+incompatible+stream+classdesc+serialVersionUID > RDD extends Serializable but doesn't have @SerialVersionUID() annotation. > Adding @SerialVersionUID would overcome local class incompatibility across > minor releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13084) Utilize @SerialVersionUID to avoid local class incompatibility
[ https://issues.apache.org/jira/browse/SPARK-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-13084: --- Description: Here is related thread: http://search-hadoop.com/m/q3RTtSjjdT1BJ4Jr/local+class+incompatible&subj=local+class+incompatible+stream+classdesc+serialVersionUID RDD extends Serializable but doesn't have @SerialVersionUID() annotation. Adding @SerialVersionUID would overcome local class incompatibility across minor releases. was: Here is related thread: http://search-hadoop.com/m/q3RTtSjjdT1BJ4Jr/local+class+incompatible&subj=local+class+incompatible+stream+classdesc+serialVersionUID RDD extends Serializable but doesn't have @SerialVersionUID() annotation. Adding @SerialVersionUID would overcome local class incompatibility. > Utilize @SerialVersionUID to avoid local class incompatibility > -- > > Key: SPARK-13084 > URL: https://issues.apache.org/jira/browse/SPARK-13084 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu > > Here is related thread: > http://search-hadoop.com/m/q3RTtSjjdT1BJ4Jr/local+class+incompatible&subj=local+class+incompatible+stream+classdesc+serialVersionUID > RDD extends Serializable but doesn't have @SerialVersionUID() annotation. > Adding @SerialVersionUID would overcome local class incompatibility across > minor releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13084) Utilize @SerialVersionUID to avoid local class incompatibility
Ted Yu created SPARK-13084: -- Summary: Utilize @SerialVersionUID to avoid local class incompatibility Key: SPARK-13084 URL: https://issues.apache.org/jira/browse/SPARK-13084 Project: Spark Issue Type: Bug Reporter: Ted Yu Here is related thread: http://search-hadoop.com/m/q3RTtSjjdT1BJ4Jr/local+class+incompatible&subj=local+class+incompatible+stream+classdesc+serialVersionUID RDD extends Serializable but doesn't have @SerialVersionUID() annotation. Adding @SerialVersionUID would overcome local class incompatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13022) Shade jackson core
[ https://issues.apache.org/jira/browse/SPARK-13022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-13022. Resolution: Won't Fix > Shade jackson core > -- > > Key: SPARK-13022 > URL: https://issues.apache.org/jira/browse/SPARK-13022 > Project: Spark > Issue Type: Improvement > Components: Deploy >Reporter: Ted Yu >Priority: Minor > > See the thread for background information: > http://search-hadoop.com/m/q3RTtYuufRO7LLG > This issue proposes to shade com.fasterxml.jackson.core as > org.spark-project.jackson -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13022) Shade jackson core
Ted Yu created SPARK-13022: -- Summary: Shade jackson core Key: SPARK-13022 URL: https://issues.apache.org/jira/browse/SPARK-13022 Project: Spark Issue Type: Improvement Components: Deploy Reporter: Ted Yu Priority: Minor See the thread for background information: http://search-hadoop.com/m/q3RTtYuufRO7LLG This issue proposes to shade com.fasterxml.jackson.core as org.spark-project.jackson -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-12778: --- Description: In Platform.java, methods of Java Unsafe are called directly without considering endianness. In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian environment. Platform.java should take endianness into account. Below is a copy of Adam's report: I've been experimenting with DataFrame operations in a mixed endian environment - a big endian master with little endian workers. With tungsten enabled I'm encountering data corruption issues. For example, with this simple test code: {code} import org.apache.spark.SparkContext import org.apache.spark._ import org.apache.spark.sql.SQLContext object SimpleSQL { def main(args: Array[String]): Unit = { if (args.length != 1) { println("Not enough args, you need to specify the master url") } val masterURL = args(0) println("Setting up Spark context at: " + masterURL) val sparkConf = new SparkConf val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf) println("Performing SQL tests") val sqlContext = new SQLContext(sc) println("SQL context set up") val df = sqlContext.read.json("/tmp/people.json") df.show() println("Selecting everyone's age and adding one to it") df.select(df("name"), df("age") + 1).show() println("Showing all people over the age of 21") df.filter(df("age") > 21).show() println("Counting people by age") df.groupBy("age").count().show() } } {code} Instead of getting {code} ++-+ | age|count| ++-+ |null|1| | 19|1| | 30|1| ++-+ {code} I get the following with my mixed endian set up: {code} +---+-+ |age|count| +---+-+ | null|1| |1369094286720630784|72057594037927936| | 30|1| +---+-+ {code} and on another run: {code} +---+-+ |age|count| +---+-+ | 0|72057594037927936| | 19|1| {code} was: In Platform.java, methods of Java Unsafe are called directly without considering endianness. In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian environment. Platform.java should take endianness into account. Below is a copy of Adam's report: I've been experimenting with DataFrame operations in a mixed endian environment - a big endian master with little endian workers. With tungsten enabled I'm encountering data corruption issues. For example, with this simple test code: {code} import org.apache.spark.SparkContext import org.apache.spark._ import org.apache.spark.sql.SQLContext object SimpleSQL { def main(args: Array[String]): Unit = { if (args.length != 1) { println("Not enough args, you need to specify the master url") } val masterURL = args(0) println("Setting up Spark context at: " + masterURL) val sparkConf = new SparkConf val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf) println("Performing SQL tests") val sqlContext = new SQLContext(sc) println("SQL context set up") val df = sqlContext.read.json("/tmp/people.json") df.show() println("Selecting everyone's age and adding one to it") df.select(df("name"), df("age") + 1).show() println("Showing all people over the age of 21") df.filter(df("age") > 21).show() println("Counting people by age") df.groupBy("age").count().show() } } {code} Instead of getting ++-+ | age|count| ++-+ |null|1| | 19|1| | 30|1| ++-+ I get the following with my mixed endian set up: +---+-+ |age|count| +---+-+ | null|1| |1369094286720630784|72057594037927936| | 30|1| +---+-+ and on another run: +---+-+ |age|count| +---+-+ | 0|72057594037927936| | 19|1| > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug > Components: Input/Output >Reporter: Ted Yu > > In Platfor
[jira] [Updated] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-12778: --- Description: In Platform.java, methods of Java Unsafe are called directly without considering endianness. In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian environment. Platform.java should take endianness into account. Below is a copy of Adam's report: I've been experimenting with DataFrame operations in a mixed endian environment - a big endian master with little endian workers. With tungsten enabled I'm encountering data corruption issues. For example, with this simple test code: {code} import org.apache.spark.SparkContext import org.apache.spark._ import org.apache.spark.sql.SQLContext object SimpleSQL { def main(args: Array[String]): Unit = { if (args.length != 1) { println("Not enough args, you need to specify the master url") } val masterURL = args(0) println("Setting up Spark context at: " + masterURL) val sparkConf = new SparkConf val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf) println("Performing SQL tests") val sqlContext = new SQLContext(sc) println("SQL context set up") val df = sqlContext.read.json("/tmp/people.json") df.show() println("Selecting everyone's age and adding one to it") df.select(df("name"), df("age") + 1).show() println("Showing all people over the age of 21") df.filter(df("age") > 21).show() println("Counting people by age") df.groupBy("age").count().show() } } {code} Instead of getting ++-+ | age|count| ++-+ |null|1| | 19|1| | 30|1| ++-+ I get the following with my mixed endian set up: +---+-+ |age|count| +---+-+ | null|1| |1369094286720630784|72057594037927936| | 30|1| +---+-+ and on another run: +---+-+ |age|count| +---+-+ | 0|72057594037927936| | 19|1| was: In Platform.java, methods of Java Unsafe are called directly without considering endianness. In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian environment. Platform.java should take endianness into account. > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug > Components: Input/Output >Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. > Below is a copy of Adam's report: > I've been experimenting with DataFrame operations in a mixed endian > environment - a big endian master with little endian workers. With tungsten > enabled I'm encountering data corruption issues. > For example, with this simple test code: > {code} > import org.apache.spark.SparkContext > import org.apache.spark._ > import org.apache.spark.sql.SQLContext > object SimpleSQL { > def main(args: Array[String]): Unit = { > if (args.length != 1) { > println("Not enough args, you need to specify the master url") > } > val masterURL = args(0) > println("Setting up Spark context at: " + masterURL) > val sparkConf = new SparkConf > val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf) > println("Performing SQL tests") > val sqlContext = new SQLContext(sc) > println("SQL context set up") > val df = sqlContext.read.json("/tmp/people.json") > df.show() > println("Selecting everyone's age and adding one to it") > df.select(df("name"), df("age") + 1).show() > println("Showing all people over the age of 21") > df.filter(df("age") > 21).show() > println("Counting people by age") > df.groupBy("age").count().show() > } > } > {code} > Instead of getting > ++-+ > | age|count| > ++-+ > |null|1| > | 19|1| > | 30|1| > ++-+ > I get the following with my mixed endian set up: > +---+-+ > |age|count| > +---+-+ > | null|
[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094090#comment-15094090 ] Ted Yu commented on SPARK-12778: I have seen SPARK-12555 but it didn't identify where in the codebase the problem is. I checked unsafe/src/main/java/org/apache/spark/unsafe/Platform.java in master branch where endianness is not taken care of. > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug > Components: Input/Output >Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094078#comment-15094078 ] Ted Yu edited comment on SPARK-12778 at 1/12/16 3:33 PM: - I did perform a search in JIRA for Unsafe related Spark JIRAs but didn't find any unresolved one. I checked master branch code base before logging this JIRA. When the above thread is indexed by search-hadoop, I will add a link. was (Author: yuzhih...@gmail.com): I did perform a search in JIRA for Unsafe related Spark JIRAs but didn't find any unresolved one. I checked master branch code base before logging this JIRA. > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug > Components: Input/Output >Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-12778: --- Component/s: Input/Output > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug > Components: Input/Output >Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094078#comment-15094078 ] Ted Yu commented on SPARK-12778: I did perform a search in JIRA for Unsafe related Spark JIRAs but didn't find any unresolved one. I checked master branch code base before logging this JIRA. > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12778) Use of Java Unsafe should take endianness into account
Ted Yu created SPARK-12778: -- Summary: Use of Java Unsafe should take endianness into account Key: SPARK-12778 URL: https://issues.apache.org/jira/browse/SPARK-12778 Project: Spark Issue Type: Bug Reporter: Ted Yu In Platform.java, methods of Java Unsafe are called directly without considering endianness. In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian environment. Platform.java should take endianness into account. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable
[ https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-4066. --- Resolution: Later Haven't heard of much feedback so far. Resolving for now. > Make whether maven builds fails on scalastyle violation configurable > > > Key: SPARK-4066 > URL: https://issues.apache.org/jira/browse/SPARK-4066 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Ted Yu >Priority: Minor > Labels: style > Attachments: spark-4066-v1.txt > > > Here is the thread Koert started: > http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit > It would be flexible if whether maven build fails due to scalastyle violation > configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12527) Add private val after @transient for kinesis-asl module
[ https://issues.apache.org/jira/browse/SPARK-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-12527. Resolution: Duplicate Dup of SPARK-1252 > Add private val after @transient for kinesis-asl module > --- > > Key: SPARK-12527 > URL: https://issues.apache.org/jira/browse/SPARK-12527 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu >Assignee: Apache Spark > > In SBT build using Scala 2.11, the following warnings were reported which > resulted in build failure > (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-branch-1.6-COMPILE-SBT-SCALA-2.11/3/consoleFull): > {code} > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33: > no valid targets for annotation on value _ssc - it is discarded unused. You > may specify targets with meta-annotations, e.g. @(transient @param) > [error] [warn] @transient _ssc: StreamingContext, > [error] [warn] > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73: > no valid targets for annotation on value sc - it is discarded unused. You > may specify targets with meta-annotations, e.g. @(transient @param) > [error] [warn] @transient sc: SparkContext, > [error] [warn] > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76: > no valid targets for annotation on value blockIds - it is discarded > unused. You may specify targets with meta-annotations, e.g. @(transient > @param) > [error] [warn] @transient blockIds: Array[BlockId], > [error] [warn] > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:78: > no valid targets for annotation on value isBlockIdValid - it is discarded > unused. You may specify targets with meta-annotations, e.g. @(transient > @param) > [error] [warn] @transient isBlockIdValid: Array[Boolean] = Array.empty, > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-12527) Add private val after @transient for kinesis-asl module
[ https://issues.apache.org/jira/browse/SPARK-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-12527: --- Comment: was deleted (was: https://github.com/apache/spark/pull/10482) > Add private val after @transient for kinesis-asl module > --- > > Key: SPARK-12527 > URL: https://issues.apache.org/jira/browse/SPARK-12527 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu > > In SBT build using Scala 2.11, the following warnings were reported which > resulted in build failure > (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-branch-1.6-COMPILE-SBT-SCALA-2.11/3/consoleFull): > {code} > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33: > no valid targets for annotation on value _ssc - it is discarded unused. You > may specify targets with meta-annotations, e.g. @(transient @param) > [error] [warn] @transient _ssc: StreamingContext, > [error] [warn] > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73: > no valid targets for annotation on value sc - it is discarded unused. You > may specify targets with meta-annotations, e.g. @(transient @param) > [error] [warn] @transient sc: SparkContext, > [error] [warn] > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76: > no valid targets for annotation on value blockIds - it is discarded > unused. You may specify targets with meta-annotations, e.g. @(transient > @param) > [error] [warn] @transient blockIds: Array[BlockId], > [error] [warn] > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:78: > no valid targets for annotation on value isBlockIdValid - it is discarded > unused. You may specify targets with meta-annotations, e.g. @(transient > @param) > [error] [warn] @transient isBlockIdValid: Array[Boolean] = Array.empty, > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12527) Add private val after @transient for kinesis-asl module
[ https://issues.apache.org/jira/browse/SPARK-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-12527: --- Description: In SBT build using Scala 2.11, the following warnings were reported which resulted in build failure (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-branch-1.6-COMPILE-SBT-SCALA-2.11/3/consoleFull): {code} [error] [warn] /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33: no valid targets for annotation on value _ssc - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] [warn] @transient _ssc: StreamingContext, [error] [warn] [error] [warn] /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73: no valid targets for annotation on value sc - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] [warn] @transient sc: SparkContext, [error] [warn] [error] [warn] /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76: no valid targets for annotation on value blockIds - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] [warn] @transient blockIds: Array[BlockId], [error] [warn] [error] [warn] /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:78: no valid targets for annotation on value isBlockIdValid - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] [warn] @transient isBlockIdValid: Array[Boolean] = Array.empty, {code} was: In SBT build using Scala 2.11, the following warnings were reported which resulted in build failure: {code} [error] [warn] /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33: no valid targets for annotation on value _ssc - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] [warn] @transient _ssc: StreamingContext, [error] [warn] [error] [warn] /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73: no valid targets for annotation on value sc - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] [warn] @transient sc: SparkContext, [error] [warn] [error] [warn] /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76: no valid targets for annotation on value blockIds - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] [warn] @transient blockIds: Array[BlockId], [error] [warn] [error] [warn] /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:78: no valid targets for annotation on value isBlockIdValid - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] [warn] @transient isBlockIdValid: Array[Boolean] = Array.empty, {code} > Add private val after @transient for kinesis-asl module > --- > > Key: SPARK-12527 > URL: https://issues.apache.org/jira/browse/SPARK-12527 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu > > In SBT build using Scala 2.11, the following warnings were reported which > resulted in build failure > (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-branch-1.6-COMPILE-SBT-SCALA-2.11/3/consoleFull): > {code} > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33: > no valid targets for annotation on value _ssc - it is discarded unused. You > may specify targets with meta-annotations, e.g. @(transient @param) > [error] [warn] @transient _ssc: StreamingContext, > [error] [warn] > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73: > no valid targets for annotation on value sc - it is discarded unused. You > may specify targets with meta-annotations, e.g. @(transient @param) > [error] [warn] @transient sc: SparkContext, > [error] [warn] > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76: > no valid targets for annotation on value blockIds - it is di
[jira] [Commented] (SPARK-12527) Add private val after @transient for kinesis-asl module
[ https://issues.apache.org/jira/browse/SPARK-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071818#comment-15071818 ] Ted Yu commented on SPARK-12527: https://github.com/apache/spark/pull/10482 > Add private val after @transient for kinesis-asl module > --- > > Key: SPARK-12527 > URL: https://issues.apache.org/jira/browse/SPARK-12527 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu > > In SBT build using Scala 2.11, the following warnings were reported which > resulted in build failure: > {code} > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33: > no valid targets for annotation on value _ssc - it is discarded unused. You > may specify targets with meta-annotations, e.g. @(transient @param) > [error] [warn] @transient _ssc: StreamingContext, > [error] [warn] > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73: > no valid targets for annotation on value sc - it is discarded unused. You > may specify targets with meta-annotations, e.g. @(transient @param) > [error] [warn] @transient sc: SparkContext, > [error] [warn] > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76: > no valid targets for annotation on value blockIds - it is discarded > unused. You may specify targets with meta-annotations, e.g. @(transient > @param) > [error] [warn] @transient blockIds: Array[BlockId], > [error] [warn] > [error] [warn] > /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:78: > no valid targets for annotation on value isBlockIdValid - it is discarded > unused. You may specify targets with meta-annotations, e.g. @(transient > @param) > [error] [warn] @transient isBlockIdValid: Array[Boolean] = Array.empty, > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12527) Add private val after @transient for kinesis-asl module
Ted Yu created SPARK-12527: -- Summary: Add private val after @transient for kinesis-asl module Key: SPARK-12527 URL: https://issues.apache.org/jira/browse/SPARK-12527 Project: Spark Issue Type: Bug Reporter: Ted Yu In SBT build using Scala 2.11, the following warnings were reported which resulted in build failure: {code} [error] [warn] /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33: no valid targets for annotation on value _ssc - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] [warn] @transient _ssc: StreamingContext, [error] [warn] [error] [warn] /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73: no valid targets for annotation on value sc - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] [warn] @transient sc: SparkContext, [error] [warn] [error] [warn] /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76: no valid targets for annotation on value blockIds - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] [warn] @transient blockIds: Array[BlockId], [error] [warn] [error] [warn] /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:78: no valid targets for annotation on value isBlockIdValid - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] [warn] @transient isBlockIdValid: Array[Boolean] = Array.empty, {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12365) Use ShutdownHookManager where Runtime.getRuntime.addShutdownHook() is called
Ted Yu created SPARK-12365: -- Summary: Use ShutdownHookManager where Runtime.getRuntime.addShutdownHook() is called Key: SPARK-12365 URL: https://issues.apache.org/jira/browse/SPARK-12365 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Ted Yu Priority: Minor SPARK-9886 fixed call to Runtime.getRuntime.addShutdownHook() in ExternalBlockStore.scala This issue intends to address remaining usage of Runtime.getRuntime.addShutdownHook() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12181) Check Cached unaligned-access capability before using Unsafe
[ https://issues.apache.org/jira/browse/SPARK-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-12181: --- Attachment: spark-12181.txt > Check Cached unaligned-access capability before using Unsafe > > > Key: SPARK-12181 > URL: https://issues.apache.org/jira/browse/SPARK-12181 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Ted Yu > Attachments: spark-12181.txt > > > For MemoryMode.OFF_HEAP, Unsafe.getInt etc. are used with no restriction. > However, the Oracle implementation uses these methods only if the class > variable unaligned (commented as "Cached unaligned-access capability") is > true, which seems to be calculated whether the architecture is i386, x86, > amd64, or x86_64. > I think we should perform similar check for the use of Unsafe. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12181) Check Cached unaligned-access capability before using Unsafe
[ https://issues.apache.org/jira/browse/SPARK-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-12181: --- Component/s: Spark Core > Check Cached unaligned-access capability before using Unsafe > > > Key: SPARK-12181 > URL: https://issues.apache.org/jira/browse/SPARK-12181 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Ted Yu > > For MemoryMode.OFF_HEAP, Unsafe.getInt etc. are used with no restriction. > However, the Oracle implementation uses these methods only if the class > variable unaligned (commented as "Cached unaligned-access capability") is > true, which seems to be calculated whether the architecture is i386, x86, > amd64, or x86_64. > I think we should perform similar check for the use of Unsafe. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org