[jira] [Updated] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-16 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-42090:
---
Description: 
Previously a boolean variable, saslTimeoutSeen, was used in 
RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
following scenario:

1. SaslTimeoutException
2. IOException
3. SaslTimeoutException
4. IOException

Even though IOException at #2 is retried (resulting in increment of 
retryCount), the retryCount would be cleared at step #4.
Since the intention of saslTimeoutSeen is to undo the increment due to retrying 
SaslTimeoutException, we should keep a counter for SaslTimeoutException retries 
and subtract the value of this counter from retryCount.

  was:
Previously a boolean variable, saslTimeoutSeen, was used. However, the boolean 
variable wouldn't cover the following scenario:

1. SaslTimeoutException
2. IOException
3. SaslTimeoutException
4. IOException

Even though IOException at #2 is retried (resulting in increment of 
retryCount), the retryCount would be cleared at step #4.
Since the intention of saslTimeoutSeen is to undo the increment due to retrying 
SaslTimeoutException, we should keep a counter for SaslTimeoutException retries 
and subtract the value of this counter from retryCount.


> Introduce sasl retry count in RetryingBlockTransferor
> -
>
> Key: SPARK-42090
> URL: https://issues.apache.org/jira/browse/SPARK-42090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Priority: Major
>
> Previously a boolean variable, saslTimeoutSeen, was used in 
> RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
> following scenario:
> 1. SaslTimeoutException
> 2. IOException
> 3. SaslTimeoutException
> 4. IOException
> Even though IOException at #2 is retried (resulting in increment of 
> retryCount), the retryCount would be cleared at step #4.
> Since the intention of saslTimeoutSeen is to undo the increment due to 
> retrying SaslTimeoutException, we should keep a counter for 
> SaslTimeoutException retries and subtract the value of this counter from 
> retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-16 Thread Ted Yu (Jira)
Ted Yu created SPARK-42090:
--

 Summary: Introduce sasl retry count in RetryingBlockTransferor
 Key: SPARK-42090
 URL: https://issues.apache.org/jira/browse/SPARK-42090
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Ted Yu


Previously a boolean variable, saslTimeoutSeen, was used. However, the boolean 
variable wouldn't cover the following scenario:

1. SaslTimeoutException
2. IOException
3. SaslTimeoutException
4. IOException

Even though IOException at #2 is retried (resulting in increment of 
retryCount), the retryCount would be cleared at step #4.
Since the intention of saslTimeoutSeen is to undo the increment due to retrying 
SaslTimeoutException, we should keep a counter for SaslTimeoutException retries 
and subtract the value of this counter from retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41853) Use Map in place of SortedMap for ErrorClassesJsonReader

2023-01-02 Thread Ted Yu (Jira)
Ted Yu created SPARK-41853:
--

 Summary: Use Map in place of SortedMap for ErrorClassesJsonReader
 Key: SPARK-41853
 URL: https://issues.apache.org/jira/browse/SPARK-41853
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Affects Versions: 3.2.3
Reporter: Ted Yu


The use of SortedMap in ErrorClassesJsonReader was mostly for making tests 
easier to write.

This PR replaces SortedMap with Map since SortedMap is slower compared to Map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41705) Move generate_protos.sh to dev/

2022-12-25 Thread Ted Yu (Jira)
Ted Yu created SPARK-41705:
--

 Summary: Move generate_protos.sh to dev/ 
 Key: SPARK-41705
 URL: https://issues.apache.org/jira/browse/SPARK-41705
 Project: Spark
  Issue Type: Task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Ted Yu


connector/connect/dev only contains one script. Moving generate_protos.sh to 
dev follows practice for other scripts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41419) [K8S] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread Ted Yu (Jira)
Ted Yu created SPARK-41419:
--

 Summary: [K8S] Decrement PVC_COUNTER when the pod deletion happens 
 Key: SPARK-41419
 URL: https://issues.apache.org/jira/browse/SPARK-41419
 Project: Spark
  Issue Type: Task
  Components: Kubernetes
Affects Versions: 3.4.0
Reporter: Ted Yu


commit cc55de3 introduced PVC_COUNTER to track outstanding number of PVCs.

PVC_COUNTER should only be decremented when the pod deletion happens (in 
response to error).

If the PVC isn't created successfully (where PVC_COUNTER isn't incremented) 
(possibly due to execution not reaching resource(pvc).create() call), we 
shouldn't decrement the counter.
variable `success` tracks the progress of PVC creation:

value 0 means PVC is not created.
value 1 means PVC has been created.
value 2 means PVC has been created but due to subsequent error, the pod is 
deleted.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41274) Bump Kubernetes Client Version to 6.2.0

2022-11-26 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-41274.

Resolution: Duplicate

This is dup of commit 02a2242a45062755bf7e20805958d5bdf1f5ed74

> Bump Kubernetes Client Version to 6.2.0
> ---
>
> Key: SPARK-41274
> URL: https://issues.apache.org/jira/browse/SPARK-41274
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Priority: Major
>
> Bump Kubernetes Client Version to 6.2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41274) Bump Kubernetes Client Version to 6.2.0

2022-11-26 Thread Ted Yu (Jira)
Ted Yu created SPARK-41274:
--

 Summary: Bump Kubernetes Client Version to 6.2.0
 Key: SPARK-41274
 URL: https://issues.apache.org/jira/browse/SPARK-41274
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.4.0
Reporter: Ted Yu


Bump Kubernetes Client Version to 6.2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41197) Upgrade Kafka version to 3.3 release

2022-11-18 Thread Ted Yu (Jira)
Ted Yu created SPARK-41197:
--

 Summary: Upgrade Kafka version to 3.3 release
 Key: SPARK-41197
 URL: https://issues.apache.org/jira/browse/SPARK-41197
 Project: Spark
  Issue Type: Improvement
  Components: Java API
Affects Versions: 3.3.1
Reporter: Ted Yu


Kafka 3.3 has been released.

This issue upgrades Kafka dependency to 3.3 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40508) Treat unknown partitioning as UnknownPartitioning

2022-09-20 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-40508:
---
Description: 
When running spark application against spark 3.3, I see the following :
{code}
java.lang.IllegalArgumentException: Unsupported data source V2 partitioning 
type: CustomPartitioning
at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:46)
at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:34)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
{code}
The CustomPartitioning works fine with Spark 3.2.1
This PR proposes to relax the code and treat all unknown partitioning the same 
way as that for UnknownPartitioning.

  was:
When running spark application against spark 3.3, I see the following :
```
java.lang.IllegalArgumentException: Unsupported data source V2 partitioning 
type: CustomPartitioning
at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:46)
at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:34)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
```
The CustomPartitioning works fine with Spark 3.2.1
This PR proposes to relax the code and treat all unknown partitioning the same 
way as that for UnknownPartitioning.


> Treat unknown partitioning as UnknownPartitioning
> -
>
> Key: SPARK-40508
> URL: https://issues.apache.org/jira/browse/SPARK-40508
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Ted Yu
>Priority: Major
>
> When running spark application against spark 3.3, I see the following :
> {code}
> java.lang.IllegalArgumentException: Unsupported data source V2 partitioning 
> type: CustomPartitioning
> at 
> org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:46)
> at 
> org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:34)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
> {code}
> The CustomPartitioning works fine with Spark 3.2.1
> This PR proposes to relax the code and treat all unknown partitioning the 
> same way as that for UnknownPartitioning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40508) Treat unknown partitioning as UnknownPartitioning

2022-09-20 Thread Ted Yu (Jira)
Ted Yu created SPARK-40508:
--

 Summary: Treat unknown partitioning as UnknownPartitioning
 Key: SPARK-40508
 URL: https://issues.apache.org/jira/browse/SPARK-40508
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Ted Yu


When running spark application against spark 3.3, I see the following :
```
java.lang.IllegalArgumentException: Unsupported data source V2 partitioning 
type: CustomPartitioning
at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:46)
at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:34)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
```
The CustomPartitioning works fine with Spark 3.2.1
This PR proposes to relax the code and treat all unknown partitioning the same 
way as that for UnknownPartitioning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39769) Rename trait Unevaluable

2022-07-13 Thread Ted Yu (Jira)
Ted Yu created SPARK-39769:
--

 Summary: Rename trait Unevaluable
 Key: SPARK-39769
 URL: https://issues.apache.org/jira/browse/SPARK-39769
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Ted Yu


I came upon `trait Unevaluable` which is defined in 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala

Unevaluable is not a word.

There are `valuable`, `invaluable` but I have never seen Unevaluable.

This issue renames the trait to Unevaluatable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38998) Call to MetricsSystem#getServletHandlers() may take place before MetricsSystem becomes running

2022-04-22 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-38998:
---
Description: 
Sometimes the following exception is observed:
{code}
java.lang.IllegalArgumentException: requirement failed: Can only call 
getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:281)
at 
org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
at org.apache.spark.SparkContext.(SparkContext.scala:597)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
{code}
It seems somehow the MetricsSystem was stopped in between start() and 
getServletHandlers() calls.
SparkContext should check the MetricsSystem becomes running before calling 
getServletHandlers()

  was:
Sometimes the following exception is observed:
{code}
java.lang.IllegalArgumentException: requirement failed: Can only call 
getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:281)
at 
org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
at org.apache.spark.SparkContext.(SparkContext.scala:597)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
{code}
It seems SparkContext should wait till the MetricsSystem becomes running before 
calling getServletHandlers()


> Call to MetricsSystem#getServletHandlers() may take place before 
> MetricsSystem becomes running
> --
>
> Key: SPARK-38998
> URL: https://issues.apache.org/jira/browse/SPARK-38998
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Ted Yu
>Priority: Major
>
> Sometimes the following exception is observed:
> {code}
> java.lang.IllegalArgumentException: requirement failed: Can only call 
> getServletHandlers on a running MetricsSystem
> at scala.Predef$.require(Predef.scala:281)
> at 
> org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
> at org.apache.spark.SparkContext.(SparkContext.scala:597)
> at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> {code}
> It seems somehow the MetricsSystem was stopped in between start() and 
> getServletHandlers() calls.
> SparkContext should check the MetricsSystem becomes running before calling 
> getServletHandlers()



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38998) Call to MetricsSystem#getServletHandlers() may take place before MetricsSystem becomes running

2022-04-22 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-38998:
---
Description: 
Sometimes the following exception is observed:
{code}
java.lang.IllegalArgumentException: requirement failed: Can only call 
getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:281)
at 
org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
at org.apache.spark.SparkContext.(SparkContext.scala:597)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
{code}
It seems SparkContext should wait till the MetricsSystem becomes running before 
calling getServletHandlers()

  was:
Sometimes the following exception is observed:
```
java.lang.IllegalArgumentException: requirement failed: Can only call 
getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:281)
at 
org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
at org.apache.spark.SparkContext.(SparkContext.scala:597)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
```
It seems SparkContext should wait till the MetricsSystem becomes running before 
calling getServletHandlers()


> Call to MetricsSystem#getServletHandlers() may take place before 
> MetricsSystem becomes running
> --
>
> Key: SPARK-38998
> URL: https://issues.apache.org/jira/browse/SPARK-38998
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Ted Yu
>Priority: Major
>
> Sometimes the following exception is observed:
> {code}
> java.lang.IllegalArgumentException: requirement failed: Can only call 
> getServletHandlers on a running MetricsSystem
> at scala.Predef$.require(Predef.scala:281)
> at 
> org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
> at org.apache.spark.SparkContext.(SparkContext.scala:597)
> at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> {code}
> It seems SparkContext should wait till the MetricsSystem becomes running 
> before calling getServletHandlers()



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38998) Call to MetricsSystem#getServletHandlers() may take place before MetricsSystem becomes running

2022-04-22 Thread Ted Yu (Jira)
Ted Yu created SPARK-38998:
--

 Summary: Call to MetricsSystem#getServletHandlers() may take place 
before MetricsSystem becomes running
 Key: SPARK-38998
 URL: https://issues.apache.org/jira/browse/SPARK-38998
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.2
Reporter: Ted Yu


Sometimes the following exception is observed:
```
java.lang.IllegalArgumentException: requirement failed: Can only call 
getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:281)
at 
org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
at org.apache.spark.SparkContext.(SparkContext.scala:597)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
```
It seems SparkContext should wait till the MetricsSystem becomes running before 
calling getServletHandlers()



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34532) IntervalUtils.add() may result in 'long overflow'

2021-02-24 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290646#comment-17290646
 ] 

Ted Yu commented on SPARK-34532:


Included the test command and some more information in the description.

You should see these errors when you run the command.

> IntervalUtils.add() may result in 'long overflow'
> -
>
> Key: SPARK-34532
> URL: https://issues.apache.org/jira/browse/SPARK-34532
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.2
>Reporter: Ted Yu
>Priority: Major
>
> I noticed the following when running test suite:
> build/sbt "sql/testOnly *SQLQueryTestSuite"
> {code}
> 19:10:17.977 ERROR org.apache.spark.scheduler.TaskSetManager: Task 1 in stage 
> 6416.0 failed 1 times; aborting job
> [info] - postgreSQL/int4.sql (2 seconds, 543 milliseconds)
> 19:10:20.994 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 
> in stage 6476.0 (TID 7789)
> java.lang.ArithmeticException: long overflow
> at java.lang.Math.multiplyExact(Math.java:892)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> {code}
> {code}
> 19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 
> in stage 14744.0 (TID 16705)
> java.lang.ArithmeticException: long overflow
> at java.lang.Math.addExact(Math.java:809)
> at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105)
> at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104)
> at 
> org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268)
> at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573)
> at 
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97)
> {code}
> This likely was caused by the following line:
> {code}
> val microseconds = left.microseconds + right.microseconds
> {code}
> We should check whether the addition would produce overflow before adding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34532) IntervalUtils.add() may result in 'long overflow'

2021-02-24 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-34532:
---
Description: 
I noticed the following when running test suite:

build/sbt "sql/testOnly *SQLQueryTestSuite"
{code}
19:10:17.977 ERROR org.apache.spark.scheduler.TaskSetManager: Task 1 in stage 
6416.0 failed 1 times; aborting job
[info] - postgreSQL/int4.sql (2 seconds, 543 milliseconds)
19:10:20.994 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 in 
stage 6476.0 (TID 7789)
java.lang.ArithmeticException: long overflow
at java.lang.Math.multiplyExact(Math.java:892)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
{code}
{code}
19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in 
stage 14744.0 (TID 16705)
java.lang.ArithmeticException: long overflow
at java.lang.Math.addExact(Math.java:809)
at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105)
at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97)
{code}
This likely was caused by the following line:
{code}
val microseconds = left.microseconds + right.microseconds
{code}
We should check whether the addition would produce overflow before adding.

  was:
I noticed the following when running test suite:
{code}
19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in 
stage 14744.0 (TID 16705)
java.lang.ArithmeticException: long overflow
at java.lang.Math.addExact(Math.java:809)
at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105)
at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97)
{code}
This likely was caused by the following line:
{code}
val microseconds = left.microseconds + right.microseconds
{code}
We should check whether the addition would produce overflow before adding.


> IntervalUtils.add() may result in 'long overflow'
> -
>
> Key: SPARK-34532
> URL: https://issues.apache.org/jira/browse/SPARK-34532
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.2
>Reporter: Ted Yu
>Priority: Major
>
> I noticed the following when running test suite:
> build/sbt "sql/testOnly *SQLQueryTestSuite"
> {code}
> 19:10:17.977 ERROR org.apache.spark.scheduler.TaskSetManager: Task 1 in stage 
> 6416.0 failed 1 times; aborting job
> [info] - postgreSQL/int4.sql (2 seconds, 543 milliseconds)
> 19:10:20.994 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 
> in stage 6476.0 (TID 7789)
> java.lang.ArithmeticException: long overflow
> at java.lang.Math.multiplyExact(Math.java:892)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
> at 
> or

[jira] [Created] (SPARK-34532) IntervalUtils.add() may result in 'long overflow'

2021-02-24 Thread Ted Yu (Jira)
Ted Yu created SPARK-34532:
--

 Summary: IntervalUtils.add() may result in 'long overflow'
 Key: SPARK-34532
 URL: https://issues.apache.org/jira/browse/SPARK-34532
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.2
Reporter: Ted Yu


I noticed the following when running test suite:
{code}
19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in 
stage 14744.0 (TID 16705)
java.lang.ArithmeticException: long overflow
at java.lang.Math.addExact(Math.java:809)
at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105)
at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97)
{code}
This likely was caused by the following line:
{code}
val microseconds = left.microseconds + right.microseconds
{code}
We should check whether the addition would produce overflow before adding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences

2021-02-19 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287403#comment-17287403
 ] 

Ted Yu commented on SPARK-34476:


The basic jsonb test is here:
https://github.com/yugabyte/yugabyte-db/blob/master/java/yb-cql-4x/src/test/java/org/yb/loadtest/TestSpark3Jsonb.java

I am working on adding get_json_string() function (via Spark extension) which 
is similar to get_json_object() but expands the last jsonb field using '->>' 
instead of '->'.

> Duplicate referenceNames are given for ambiguousReferences
> --
>
> Key: SPARK-34476
> URL: https://issues.apache.org/jira/browse/SPARK-34476
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ted Yu
>Priority: Major
>
> When running test with Spark extension that converts custom function to json 
> path expression, I saw the following in test output:
> {code}
> 2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
> org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is 
> == Physical Plan ==
> org.apache.spark.sql.AnalysisException: Reference 
> 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
> {code}
> Please note the candidates following 'could be' are the same.
> Here is the physical plan for a working query where phone is a jsonb column:
> {code}
> TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
> output=[id#6,address#7,key#0])
> +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
>+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
> Scan: test.person
>  - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
>  - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
> {code}
> The difference for the failed query is that it tries to use 
> {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as 
> part of filter).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences

2021-02-19 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-34476:
---
Description: 
When running test with Spark extension that converts custom function to json 
path expression, I saw the following in test output:
{code}
2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == 
Physical Plan ==
org.apache.spark.sql.AnalysisException: Reference 
'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
{code}
Please note the candidates following 'could be' are the same.
Here is the physical plan for a working query where phone is a jsonb column:
{code}
TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
output=[id#6,address#7,key#0])
+- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
   +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
Scan: test.person
 - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
 - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
{code}
The difference for the failed query is that it tries to use 
{code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as 
part of filter).

  was:
When running test with Spark extension that converts custom function to json 
path expression, I saw the following in test output:
{code}
2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == 
Physical Plan ==
org.apache.spark.sql.AnalysisException: Reference 
'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
{code}
Please note the candidates following 'could be' are the same.
Here is the physical plan for a working query where phone is a jsonb column:
{code}
TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
output=[id#6,address#7,key#0])
+- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
   +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
Scan: test.person
 - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
 - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
{code}
The difference for the failed query is that it tries to use 
phone->'key'->1->'m'->2->>'b' in the projection (which works as part of filter).


> Duplicate referenceNames are given for ambiguousReferences
> --
>
> Key: SPARK-34476
> URL: https://issues.apache.org/jira/browse/SPARK-34476
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ted Yu
>Priority: Major
>
> When running test with Spark extension that converts custom function to json 
> path expression, I saw the following in test output:
> {code}
> 2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
> org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is 
> == Physical Plan ==
> org.apache.spark.sql.AnalysisException: Reference 
> 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
> {code}
> Please note the candidates following 'could be' are the same.
> Here is the physical plan for a working query where phone is a jsonb column:
> {code}
> TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
> output=[id#6,address#7,key#0])
> +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
>+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
> Scan: test.person
>  - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
>  - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
> {code}
> The difference for the failed query is that it tries to use 
> {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as 
> part of filter).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences

2021-02-19 Thread Ted Yu (Jira)
Ted Yu created SPARK-34476:
--

 Summary: Duplicate referenceNames are given for ambiguousReferences
 Key: SPARK-34476
 URL: https://issues.apache.org/jira/browse/SPARK-34476
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Ted Yu


When running test with Spark extension that converts custom function to json 
path expression, I saw the following in test output:
{code}
2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == 
Physical Plan ==
org.apache.spark.sql.AnalysisException: Reference 
'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
{code}
Please note the candidates following 'could be' are the same.
Here is the physical plan for a working query where phone is a jsonb column:
{code}
TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
output=[id#6,address#7,key#0])
+- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
   +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
Scan: test.person
 - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
 - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
{code}
The difference for the failed query is that it tries to use 
phone->'key'->1->'m'->2->>'b' in the projection (which works as part of filter).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34017) Pass json column information via pruneColumns()

2021-01-05 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259423#comment-17259423
 ] 

Ted Yu commented on SPARK-34017:


For PushDownUtils#pruneColumns, I am experimenting with the following:
{code}
  case r: SupportsPushDownRequiredColumns if 
SQLConf.get.nestedSchemaPruningEnabled =>
val JSONCapture = "get_json_object\\((.*), *(.*)\\)".r
var jsonRootFields : ArrayBuffer[RootField] = ArrayBuffer()
projects.map{ _.map{ f => f.toString match {
  case JSONCapture(column, field) =>
jsonRootFields += RootField(StructField(column, f.dataType, 
f.nullable),
  derivedFromAtt = false, prunedIfAnyChildAccessed = true)
  case _ => logDebug("else " + f)
}}}
val rootFields = SchemaPruning.identifyRootFields(projects, filters) ++ 
jsonRootFields
{code}

> Pass json column information via pruneColumns()
> ---
>
> Key: SPARK-34017
> URL: https://issues.apache.org/jira/browse/SPARK-34017
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushDownUtils#pruneColumns only passes root fields to 
> SupportsPushDownRequiredColumns implementation(s).
> {code}
> 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
> projection List(id#33, address#34, phone#36, get_json_object(phone#36, 
> $.code) AS get_json_object(phone, $.code)#37)
> 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
> StructType(StructField(id,IntegerType,false), 
> StructField(address,StringType,true), StructField(phone,StringType,true))
> {code}
> The first line shows projections and the second line shows the pruned schema.
> We can see that get_json_object(phone#36, $.code) is filtered. This 
> expression retrieves field 'code' from phone json column.
> We should allow json column information to be passed via pruneColumns().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34017) Pass json column information via pruneColumns()

2021-01-05 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-34017:
---
Description: 
Currently PushDownUtils#pruneColumns only passes root fields to 
SupportsPushDownRequiredColumns implementation(s).
{code}
2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) 
AS get_json_object(phone, $.code)#37)
2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
StructType(StructField(id,IntegerType,false), 
StructField(address,StringType,true), StructField(phone,StringType,true))
{code}
The first line shows projections and the second line shows the pruned schema.

We can see that get_json_object(phone#36, $.code) is filtered. This expression 
retrieves field 'code' from phone json column.

We should allow json column information to be passed via pruneColumns().

  was:

Currently PushDownUtils#pruneColumns only passes root fields to 
SupportsPushDownRequiredColumns implementation(s).

2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) 
AS get_json_object(phone, $.code)#37)
2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
StructType(StructField(id,IntegerType,false), 
StructField(address,StringType,true), StructField(phone,StringType,true))

The first line shows projections and the second line shows the pruned schema.

We can see that get_json_object(phone#36, $.code) is filtered. This expression 
retrieves field 'code' from phone json column.

We should allow json column information to be passed via pruneColumns().


> Pass json column information via pruneColumns()
> ---
>
> Key: SPARK-34017
> URL: https://issues.apache.org/jira/browse/SPARK-34017
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushDownUtils#pruneColumns only passes root fields to 
> SupportsPushDownRequiredColumns implementation(s).
> {code}
> 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
> projection List(id#33, address#34, phone#36, get_json_object(phone#36, 
> $.code) AS get_json_object(phone, $.code)#37)
> 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
> StructType(StructField(id,IntegerType,false), 
> StructField(address,StringType,true), StructField(phone,StringType,true))
> {code}
> The first line shows projections and the second line shows the pruned schema.
> We can see that get_json_object(phone#36, $.code) is filtered. This 
> expression retrieves field 'code' from phone json column.
> We should allow json column information to be passed via pruneColumns().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34017) Pass json column information via pruneColumns()

2021-01-05 Thread Ted Yu (Jira)
Ted Yu created SPARK-34017:
--

 Summary: Pass json column information via pruneColumns()
 Key: SPARK-34017
 URL: https://issues.apache.org/jira/browse/SPARK-34017
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.1
Reporter: Ted Yu



Currently PushDownUtils#pruneColumns only passes root fields to 
SupportsPushDownRequiredColumns implementation(s).

2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) 
AS get_json_object(phone, $.code)#37)
2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
StructType(StructField(id,IntegerType,false), 
StructField(address,StringType,true), StructField(phone,StringType,true))

The first line shows projections and the second line shows the pruned schema.

We can see that get_json_object(phone#36, $.code) is filtered. This expression 
retrieves field 'code' from phone json column.

We should allow json column information to be passed via pruneColumns().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33997) Running bin/spark-sql gives NoSuchMethodError

2021-01-04 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-33997.

Resolution: Cannot Reproduce

Rebuilt Spark locally and the error was gone.

> Running bin/spark-sql gives NoSuchMethodError
> -
>
> Key: SPARK-33997
> URL: https://issues.apache.org/jira/browse/SPARK-33997
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> I ran 'mvn install -Phive -Phive-thriftserver -DskipTests'
> Running bin/spark-sql gives the following error:
> {code}
> 21/01/05 00:06:06 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.spark.sql.internal.SharedState$.loadHiveConfFile$default$3()Lscala/collection/Map;
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:136)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:934)
> {code}
> Scala version 2.12.10



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33997) Running bin/spark-sql gives NoSuchMethodError

2021-01-04 Thread Ted Yu (Jira)
Ted Yu created SPARK-33997:
--

 Summary: Running bin/spark-sql gives NoSuchMethodError
 Key: SPARK-33997
 URL: https://issues.apache.org/jira/browse/SPARK-33997
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1
Reporter: Ted Yu


I ran 'mvn install -Phive -Phive-thriftserver -DskipTests'

Running bin/spark-sql gives the following error:
{code}
21/01/05 00:06:06 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.spark.sql.internal.SharedState$.loadHiveConfFile$default$3()Lscala/collection/Map;
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:136)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:934)
{code}
Scala version 2.12.10



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

2021-01-02 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17257647#comment-17257647
 ] 

Ted Yu commented on SPARK-33915:


Here is sample code for capturing the column and fields in downstream 
PredicatePushDown.scala
{code}
  private val JSONCapture = "`GetJsonObject\\((.*),(.*)\\)`".r
  private def transformGetJsonObject(p: Predicate): Predicate = {
val eq = p.asInstanceOf[sources.EqualTo]
eq.attribute match {
  case JSONCapture(column,field) =>
val colName = column.toString.split("#")(0)
val names = field.toString.split("\\.").foldLeft(List[String]()){(z, n) 
=> z :+ "->'"+n+"'" }
sources.EqualTo(colName + names.slice(1, names.size).mkString(""), 
eq.value).asInstanceOf[Predicate]
  case _ => sources.EqualTo("foo", "bar").asInstanceOf[Predicate]
}
  }
{code}

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Assignee: Apache Spark
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-33915) Allow json expression to be pushable column

2021-01-02 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-33915:
---
Comment: was deleted

(was: Opened https://github.com/apache/spark/pull/30984)

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Assignee: Apache Spark
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33953) readImages test fails due to NoClassDefFoundError for ImageTypeSpecifier

2020-12-31 Thread Ted Yu (Jira)
Ted Yu created SPARK-33953:
--

 Summary: readImages test fails due to NoClassDefFoundError for 
ImageTypeSpecifier
 Key: SPARK-33953
 URL: https://issues.apache.org/jira/browse/SPARK-33953
 Project: Spark
  Issue Type: Test
  Components: ML
Affects Versions: 3.0.1
Reporter: Ted Yu


>From https://github.com/apache/spark/pull/30984/checks?check_run_id=1630709203 
>:
```
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1 in stage 21.0 failed 1 times, most recent failure: Lost task 1.0 in 
stage 21.0 (TID 20) (fv-az212-589.internal.cloudapp.net executor driver): 
java.lang.NoClassDefFoundError: Could not initialize class 
javax.imageio.ImageTypeSpecifier
[info]  at 
com.sun.imageio.plugins.png.PNGImageReader.getImageTypes(PNGImageReader.java:1531)
[info]  at 
com.sun.imageio.plugins.png.PNGImageReader.readImage(PNGImageReader.java:1318)
```
It seems some dependency is missing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

2020-12-31 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17257025#comment-17257025
 ] 

Ted Yu commented on SPARK-33915:


Opened https://github.com/apache/spark/pull/30984

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column

2020-12-31 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255389#comment-17255389
 ] 

Ted Yu edited comment on SPARK-33915 at 12/31/20, 3:16 PM:
---

Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.code) = 1200)
  +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) 
AS phone#33]
   +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: [[phone->'code' = ?, 1200]]
 - Requested Columns: [id,address,phone]

{code}


was (Author: yuzhih...@gmail.com):
Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.phone) = 1200)
  +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) 
AS phone#33]
   +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: [[phone->'phone' = ?, 1200]]
 - Requested Columns: [id,address,phone]

{code}

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column

2020-12-29 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255389#comment-17255389
 ] 

Ted Yu edited comment on SPARK-33915 at 12/29/20, 6:51 PM:
---

Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.phone) = 1200)
  +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) 
AS phone#33]
   +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: [[phone->'phone' = ?, 1200]]
 - Requested Columns: [id,address,phone]

{code}


was (Author: yuzhih...@gmail.com):
Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.phone) = 1200)
  +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: [["`GetJsonObject(phone#37,$.phone)`" = ?, 1200]]
 - Requested Columns: [id,address,phone]
{code}

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

2020-12-28 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255631#comment-17255631
 ] 

Ted Yu commented on SPARK-33915:


[~codingcat] [~XuanYuan][~viirya][~Alex Herman]
Can you provide your comment ?

Thanks

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

2020-12-27 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255389#comment-17255389
 ] 

Ted Yu commented on SPARK-33915:


Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.phone) = 1200)
  +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: [["`GetJsonObject(phone#37,$.phone)`" = ?, 1200]]
 - Requested Columns: [id,address,phone]
{code}

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column

2020-12-25 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254944#comment-17254944
 ] 

Ted Yu edited comment on SPARK-33915 at 12/26/20, 4:14 AM:
---

I have experimented with two patches which allow json expression to be pushable 
column (without the presence of cast).

The first involves duplicating GetJsonObject as GetJsonString where eval() 
method returns UTF8String. FunctionRegistry.scala and functions.scala would 
have corresponding change.
In PushableColumnBase class, new case for GetJsonString is added.

Since code duplication is undesirable and case class cannot be directly 
subclassed, there is more work to be done in this direction.

The second is simpler. The return type of GetJsonObject#eval is changed to 
UTF8String.
I have run through catalyst / sql core unit tests which passed.
In PushableColumnBase class, new case for GetJsonObject is added.

In test output, I observed the following (for second patch, output for first 
patch is similar):
{code}
Post-Scan Filters: (get_json_object(phone#37, $.code) = 1200)
{code}
Comment on the two approaches (and other approach) is welcome.

Below is snippet for second approach.
{code}
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index c22b68890a..8004ddd735 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression)

   @transient private lazy val parsedPath = 
parsePath(path.eval().asInstanceOf[UTF8String])

-  override def eval(input: InternalRow): Any = {
+  override def eval(input: InternalRow): UTF8String = {
 val jsonStr = json.eval(input).asInstanceOf[UTF8String]
 if (jsonStr == null) {
   return null
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
index e4f001d61a..1cdc2642ba 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
@@ -723,6 +725,12 @@ abstract class PushableColumnBase {
 }
   case s: GetStructField if nestedPredicatePushdownEnabled =>
 helper(s.child).map(_ :+ s.childSchema(s.ordinal).name)
+  case GetJsonObject(col, field) =>
+Some(Seq("GetJsonObject(" + col + "," + field + ")"))
   case _ => None
 }
 helper(e).map(_.quoted)
{code}


was (Author: yuzhih...@gmail.com):
I have experimented with two patches which allow json expression to be pushable 
column (without the presence of cast).

The first involves duplicating GetJsonObject as GetJsonString where eval() 
method returns UTF8String. FunctionRegistry.scala and functions.scala would 
have corresponding change.
In PushableColumnBase class, new case for GetJsonString is added.

Since code duplication is undesirable and case class cannot be directly 
subclassed, there is more work to be done in this direction.

The second is simpler. The return type of GetJsonObject#eval is changed to 
UTF8String.
I have run through catalyst / sql core unit tests which passed.
In PushableColumnBase class, new case for GetJsonObject is added.

In test output, I observed the following (for second patch, output for first 
patch is similar):
{code}
Post-Scan Filters: (get_json_object(phone#37, $.phone) = 1200)
{code}
Comment on the two approaches (and other approach) is welcome.

Below is snippet for second approach.
{code}
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index c22b68890a..8004ddd735 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression)

   @transient private lazy val parsedPath = 
parsePath(path.eval().asInstanceOf[UTF8String])

-  override def eval(input: InternalRow): Any = {
+  override def eval(input: InternalRow): UTF8String = {
 val jsonStr = json.eval(input).asInstanceOf[UTF8String]
 if (jsonStr == null) {
   return null
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.s

[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column

2020-12-25 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254944#comment-17254944
 ] 

Ted Yu edited comment on SPARK-33915 at 12/26/20, 4:14 AM:
---

I have experimented with two patches which allow json expression to be pushable 
column (without the presence of cast).

The first involves duplicating GetJsonObject as GetJsonString where eval() 
method returns UTF8String. FunctionRegistry.scala and functions.scala would 
have corresponding change.
In PushableColumnBase class, new case for GetJsonString is added.

Since code duplication is undesirable and case class cannot be directly 
subclassed, there is more work to be done in this direction.

The second is simpler. The return type of GetJsonObject#eval is changed to 
UTF8String.
I have run through catalyst / sql core unit tests which passed.
In PushableColumnBase class, new case for GetJsonObject is added.

In test output, I observed the following (for second patch, output for first 
patch is similar):
{code}
Post-Scan Filters: (get_json_object(phone#37, $.phone) = 1200)
{code}
Comment on the two approaches (and other approach) is welcome.

Below is snippet for second approach.
{code}
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index c22b68890a..8004ddd735 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression)

   @transient private lazy val parsedPath = 
parsePath(path.eval().asInstanceOf[UTF8String])

-  override def eval(input: InternalRow): Any = {
+  override def eval(input: InternalRow): UTF8String = {
 val jsonStr = json.eval(input).asInstanceOf[UTF8String]
 if (jsonStr == null) {
   return null
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
index e4f001d61a..1cdc2642ba 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
@@ -723,6 +725,12 @@ abstract class PushableColumnBase {
 }
   case s: GetStructField if nestedPredicatePushdownEnabled =>
 helper(s.child).map(_ :+ s.childSchema(s.ordinal).name)
+  case GetJsonObject(col, field) =>
+Some(Seq("GetJsonObject(" + col + "," + field + ")"))
   case _ => None
 }
 helper(e).map(_.quoted)
{code}


was (Author: yuzhih...@gmail.com):
I have experimented with two patches which allow json expression to be pushable 
column (without the presence of cast).

The first involves duplicating GetJsonObject as GetJsonString where eval() 
method returns UTF8String. FunctionRegistry.scala and functions.scala would 
have corresponding change.
In PushableColumnBase class, new case for GetJsonString is added.

Since code duplication is undesirable and case class cannot be directly 
subclassed, there is more work to be done in this direction.

The second is simpler. The return type of GetJsonObject#eval is changed to 
UTF8String.
I have run through catalyst / sql core unit tests which passed.
In PushableColumnBase class, new case for GetJsonObject is added.

In test output, I observed the following (for second patch, output for first 
patch is similar):

Post-Scan Filters: (get_json_object(phone#37, $.phone) = 1200)

Comment on the two approaches (and other approach) is welcome.

Below is snippet for second approach.
{code}
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index c22b68890a..8004ddd735 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression)

   @transient private lazy val parsedPath = 
parsePath(path.eval().asInstanceOf[UTF8String])

-  override def eval(input: InternalRow): Any = {
+  override def eval(input: InternalRow): UTF8String = {
 val jsonStr = json.eval(input).asInstanceOf[UTF8String]
 if (jsonStr == null) {
   return null
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
index 

[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

2020-12-25 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254944#comment-17254944
 ] 

Ted Yu commented on SPARK-33915:


I have experimented with two patches which allow json expression to be pushable 
column (without the presence of cast).

The first involves duplicating GetJsonObject as GetJsonString where eval() 
method returns UTF8String. FunctionRegistry.scala and functions.scala would 
have corresponding change.
In PushableColumnBase class, new case for GetJsonString is added.

Since code duplication is undesirable and case class cannot be directly 
subclassed, there is more work to be done in this direction.

The second is simpler. The return type of GetJsonObject#eval is changed to 
UTF8String.
I have run through catalyst / sql core unit tests which passed.
In PushableColumnBase class, new case for GetJsonObject is added.

In test output, I observed the following (for second patch, output for first 
patch is similar):

Post-Scan Filters: (get_json_object(phone#37, $.phone) = 1200)

Comment on the two approaches (and other approach) is welcome.

Below is snippet for second approach.
{code}
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index c22b68890a..8004ddd735 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression)

   @transient private lazy val parsedPath = 
parsePath(path.eval().asInstanceOf[UTF8String])

-  override def eval(input: InternalRow): Any = {
+  override def eval(input: InternalRow): UTF8String = {
 val jsonStr = json.eval(input).asInstanceOf[UTF8String]
 if (jsonStr == null) {
   return null
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
index e4f001d61a..1cdc2642ba 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
@@ -723,6 +725,12 @@ abstract class PushableColumnBase {
 }
   case s: GetStructField if nestedPredicatePushdownEnabled =>
 helper(s.child).map(_ :+ s.childSchema(s.ordinal).name)
+  case GetJsonObject(col, field) =>
+Some(Seq("GetJsonObject(" + col + "," + field + ")"))
   case _ => None
 }
 helper(e).map(_.quoted)
{code}

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33915) Allow json expression to be pushable column

2020-12-25 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-33915:
---
Description: 
Currently PushableColumnBase provides no support for json / jsonb expression.

Example of json expression:
{code}
get_json_object(phone, '$.code') = '1200'
{code}
If non-string literal is part of the expression, the presence of cast() would 
complicate the situation.

Implication is that implementation of SupportsPushDownFilters doesn't have a 
chance to perform pushdown even if third party DB engine supports json 
expression pushdown.

This issue is for discussion and implementation of Spark core changes which 
would allow json expression to be recognized as pushable column.

  was:
Currently PushableColumnBase provides no support for json / jsonb expression.

Example of json expression:
get_json_object(phone, '$.code') = '1200'

If non-string literal is part of the expression, the presence of cast() would 
complicate the situation.

Implication is that implementation of SupportsPushDownFilters doesn't have a 
chance to perform pushdown even if third party DB engine supports json 
expression pushdown.

This issue is for discussion and implementation of Spark core changes which 
would allow json expression to be recognized as pushable column.


> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33915) Allow json expression to be pushable column

2020-12-25 Thread Ted Yu (Jira)
Ted Yu created SPARK-33915:
--

 Summary: Allow json expression to be pushable column
 Key: SPARK-33915
 URL: https://issues.apache.org/jira/browse/SPARK-33915
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: Ted Yu


Currently PushableColumnBase provides no support for json / jsonb expression.

Example of json expression:
get_json_object(phone, '$.code') = '1200'

If non-string literal is part of the expression, the presence of cast() would 
complicate the situation.

Implication is that implementation of SupportsPushDownFilters doesn't have a 
chance to perform pushdown even if third party DB engine supports json 
expression pushdown.

This issue is for discussion and implementation of Spark core changes which 
would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25954) Upgrade to Kafka 2.1.0

2018-11-06 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677152#comment-16677152
 ] 

Ted Yu commented on SPARK-25954:


Looking at Kafka thread, message from Satish indicated there may be another RC 
coming.

> Upgrade to Kafka 2.1.0
> --
>
> Key: SPARK-25954
> URL: https://issues.apache.org/jira/browse/SPARK-25954
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Kafka 2.1.0 RC0 is started. Since this includes official KAFKA-7264 JDK 11 
> support, we had better use that.
> - 
> https://lists.apache.org/thread.html/8288f0afdfed4d329f1a8338320b6e24e7684a0593b4bbd6f1b79101@%3Cdev.kafka.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25201) Synchronization performed on AtomicReference in LevelDB class

2018-08-22 Thread Ted Yu (JIRA)
Ted Yu created SPARK-25201:
--

 Summary: Synchronization performed on AtomicReference in LevelDB 
class
 Key: SPARK-25201
 URL: https://issues.apache.org/jira/browse/SPARK-25201
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 2.3.1
Reporter: Ted Yu


Here is related code:
{code}
  void closeIterator(LevelDBIterator it) throws IOException {
synchronized (this._db) {
{code}
There are more than one occurrence of such usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23636) [SPARK 2.2] | Kafka Consumer | KafkaUtils.createRDD throws Exception - java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access

2018-06-28 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527204#comment-16527204
 ] 

Ted Yu commented on SPARK-23636:


It seems in KafkaDataConsumer#close :
{code}
 def close(): Unit = consumer.close()
{code}
The code should catch ConcurrentModificationException and try closing the 
consumer again.

> [SPARK 2.2] | Kafka Consumer | KafkaUtils.createRDD throws Exception - 
> java.util.ConcurrentModificationException: KafkaConsumer is not safe for 
> multi-threaded access
> -
>
> Key: SPARK-23636
> URL: https://issues.apache.org/jira/browse/SPARK-23636
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Deepak
>Priority: Major
>  Labels: performance
>
> h2.  
> h2. Summary
>  
> While using the KafkaUtils.createRDD API - we receive below listed error, 
> specifically when 1 executor connects to 1 kafka topic-partition, but with 
> more than 1 core & fetches an Array(OffsetRanges)
>  
> _I've tagged this issue to "Structured Streaming" - as I could not find a 
> more appropriate component_ 
>  
> 
> h2. Error Faced
> {noformat}
> java.util.ConcurrentModificationException: KafkaConsumer is not safe for 
> multi-threaded access{noformat}
>  Stack Trace
> {noformat}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 5 in stage 1.0 failed 4 times, most recent failure: Lost task 5.3 in 
> stage 1.0 (TID 17, host, executor 16): 
> java.util.ConcurrentModificationException: KafkaConsumer is not safe for 
> multi-threaded access
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:1629)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:1528)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:1508)
> at 
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.close(CachedKafkaConsumer.scala:59)
> at 
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer$.remove(CachedKafkaConsumer.scala:185)
> at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.(KafkaRDD.scala:204)
> at org.apache.spark.streaming.kafka010.KafkaRDD.compute(KafkaRDD.scala:181)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323){noformat}
>  
> 
> h2. Config Used to simulate the error
> A session with : 
>  * Executors - 1
>  * Cores - 2 or More
>  * Kafka Topic - has only 1 partition
>  * While fetching - More than one Array of Offset Range , Example 
> {noformat}
> Array(OffsetRange("kafka_topic",0,608954201,608954202),
> OffsetRange("kafka_topic",0,608954202,608954203)
> ){noformat}
>  
> 
> h2. Was this approach working before?
>  
> This was working in spark 1.6.2
> However, from spark 2.1 onwards - the approach throws exception
>  
> 
> h2. Why are we fetching from kafka as mentioned above.
>  
> This gives us the capability to establish a connection to Kafka Broker for 
> every spark executor's core, thus each core can fetch/process its own set of 
> messages based on the specified (offset ranges).
>  
>  
> 
> h2. Sample Code
>  
> {quote}scala snippet - on versions spark 2.2.0 or 2.1.0
> // Bunch of imports
> import kafka.serializer.\{DefaultDecoder, StringDecoder}
>  import org.apache.avro.generic.GenericRecord
>  import org.apache.kafka.clients.consumer.ConsumerRecord
>  import org.apache.kafka.common.serialization._
>  import org.apache.spark.rdd.RDD
>  import org.apache.spark.sql.\{DataFrame, Row, SQLContext}
>  import org.apache.spark.sql.Row
>  import org.apache.spark.sql.hive.HiveContext
>  import org.apache.spark.sql.types.\{StringType, StructField, StructType}
>  import org.apache.spark.streaming.kafka010._
>  import org.apache.spark.streaming.kafka010.KafkaUtils._
> {quote}
> {quote}// This forces two connections - from a single executor - to 
> topic-partition .
> // And with 2 cores assigned to 1 executor : each core has a task - pulling 
> respective offsets : OffsetRange("kafka_topic",0,1,2) & 
> OffsetRange("kafka_topic",0,2,3)
> val parallelizedRanges = Array(OffsetRange("kafka_topic",0,1,2), // Fetching 
> sample 2 records 
>  OffsetRange("kafka_topic",0,2,3) // Fetching sample 2 records 
>  )
>  
> // Initiate kafka properties
> val kafkaParams1: java.util.Map[String, Object] = new java.util.HashMap()
> // kafkaParams1.put("key","val") add all the parameters such as broker, 
> topic Not listing every property here.
>  
> // Create RDD
> val rDDConsumerRec: RDD[ConsumerRecord[String, String]] =
>  createRDD[String, String](sparkContext
>  , kafkaParams1, parallelizedRanges, LocationStrategies.Prefe

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-06-03 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499535#comment-16499535
 ] 

Ted Yu commented on SPARK-18057:


I have created a PR, shown above.

Kafka 2.0.0 is used since that would be the release KIP-266 is integrated.

If there is no objection, the JIRA title should be modified.

> Update structured streaming kafka from 0.10.0.1 to 1.1.0
> 
>
> Key: SPARK-18057
> URL: https://issues.apache.org/jira/browse/SPARK-18057
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Reporter: Cody Koeninger
>Priority: Major
>
> There are a couple of relevant KIPs here, 
> https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-05-31 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497070#comment-16497070
 ] 

Ted Yu commented on SPARK-18057:


I tend to agree with Cody.

Just wondering if other people would accept kafka-0-10-sql module referencing 
Kafka 2.0.0 release.

> Update structured streaming kafka from 0.10.0.1 to 1.1.0
> 
>
> Key: SPARK-18057
> URL: https://issues.apache.org/jira/browse/SPARK-18057
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Reporter: Cody Koeninger
>Priority: Major
>
> There are a couple of relevant KIPs here, 
> https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-05-30 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495580#comment-16495580
 ] 

Ted Yu commented on SPARK-18057:


There are compilation errors in KafkaTestUtils.scala against Kafka 1.0.1 
release.

What's the preferred way forward:

* use reflection in KafkaTestUtils.scala
* create external/kafka-1-0-sql module

Thanks

> Update structured streaming kafka from 0.10.0.1 to 1.1.0
> 
>
> Key: SPARK-18057
> URL: https://issues.apache.org/jira/browse/SPARK-18057
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Reporter: Cody Koeninger
>Priority: Major
>
> There are a couple of relevant KIPs here, 
> https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21569) Internal Spark class needs to be kryo-registered

2018-05-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472295#comment-16472295
 ] 

Ted Yu commented on SPARK-21569:


What would be workaround ?

Thanks

> Internal Spark class needs to be kryo-registered
> 
>
> Key: SPARK-21569
> URL: https://issues.apache.org/jira/browse/SPARK-21569
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Ryan Williams
>Priority: Major
>
> [Full repro here|https://github.com/ryan-williams/spark-bugs/tree/hf]
> As of 2.2.0, {{saveAsNewAPIHadoopFile}} jobs fail (when 
> {{spark.kryo.registrationRequired=true}}) with:
> {code}
> java.lang.IllegalArgumentException: Class is not registered: 
> org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage
> Note: To register this class use: 
> kryo.register(org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage.class);
>   at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:458)
>   at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
>   at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:488)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:593)
>   at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:315)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:383)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This internal Spark class should be kryo-registered by Spark by default.
> This was not a problem in 2.1.1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23714) Add metrics for cached KafkaConsumer

2018-03-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416138#comment-16416138
 ] 

Ted Yu commented on SPARK-23714:


[~tdas]:
What do you think ?

Thanks

> Add metrics for cached KafkaConsumer
> 
>
> Key: SPARK-23714
> URL: https://issues.apache.org/jira/browse/SPARK-23714
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Ted Yu
>Priority: Major
>
> SPARK-23623 added KafkaDataConsumer to avoid concurrent use of cached 
> KafkaConsumer.
> This JIRA is to add metrics for measuring the operations of the cache so that 
> users can gain insight into the caching solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23714) Add metrics for cached KafkaConsumer

2018-03-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403792#comment-16403792
 ] 

Ted Yu commented on SPARK-23714:


Ryan reminded me of the possibility of using the existing metrics system.
I think that implies passing SparkEnv to KafkaDataConsumer

> Add metrics for cached KafkaConsumer
> 
>
> Key: SPARK-23714
> URL: https://issues.apache.org/jira/browse/SPARK-23714
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Ted Yu
>Priority: Major
>
> SPARK-23623 added KafkaDataConsumer to avoid concurrent use of cached 
> KafkaConsumer.
> This JIRA is to add metrics for measuring the operations of the cache so that 
> users can gain insight into the caching solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23714) Add metrics for cached KafkaConsumer

2018-03-16 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403112#comment-16403112
 ] 

Ted Yu commented on SPARK-23714:


[~tdas] made the following suggestion:

apache commons pool does expose metrics on idle/active counts. See 
https://commons.apache.org/proper/commons-pool/apidocs/org/apache/commons/pool2/impl/BaseGenericObjectPool.html

> Add metrics for cached KafkaConsumer
> 
>
> Key: SPARK-23714
> URL: https://issues.apache.org/jira/browse/SPARK-23714
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Ted Yu
>Priority: Major
>
> SPARK-23623 added KafkaDataConsumer to avoid concurrent use of cached 
> KafkaConsumer.
> This JIRA is to add metrics for measuring the operations of the cache so that 
> users can gain insight into the caching solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23714) Add metrics for cached KafkaConsumer

2018-03-16 Thread Ted Yu (JIRA)
Ted Yu created SPARK-23714:
--

 Summary: Add metrics for cached KafkaConsumer
 Key: SPARK-23714
 URL: https://issues.apache.org/jira/browse/SPARK-23714
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 2.3.0
Reporter: Ted Yu


SPARK-23623 added KafkaDataConsumer to avoid concurrent use of cached 
KafkaConsumer.

This JIRA is to add metrics for measuring the operations of the cache so that 
users can gain insight into the caching solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23347) Introduce buffer between Java data stream and gzip stream

2018-02-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354939#comment-16354939
 ] 

Ted Yu commented on SPARK-23347:


{code}
  public final byte[] serialize(Object o) throws Exception {
{code}
when o is an integer, I assume.

Looked at ObjectMapper.java source code where generator is involved.
Need to dig some more.

> Introduce buffer between Java data stream and gzip stream
> -
>
> Key: SPARK-23347
> URL: https://issues.apache.org/jira/browse/SPARK-23347
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Ted Yu
>Priority: Minor
>
> Currently GZIPOutputStream is used directly around ByteArrayOutputStream 
> e.g. from KVStoreSerializer :
> {code}
>   ByteArrayOutputStream bytes = new ByteArrayOutputStream();
>   GZIPOutputStream out = new GZIPOutputStream(bytes);
> {code}
> This seems inefficient.
> GZIPOutputStream does not implement the write(byte) method. It only provides 
> a write(byte[], offset, len) method, which calls the corresponding JNI zlib 
> function.
> BufferedOutputStream can be introduced wrapping GZIPOutputStream for better 
> performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23347) Introduce buffer between Java data stream and gzip stream

2018-02-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354907#comment-16354907
 ] 

Ted Yu commented on SPARK-23347:


See JDK 1.8 code:
{code}
class DeflaterOutputStream {
public void write(int b) throws IOException {
byte[] buf = new byte[1];
buf[0] = (byte)(b & 0xff);
write(buf, 0, 1);
}

public void write(byte[] b, int off, int len) throws IOException {
if (def.finished()) {
throw new IOException("write beyond end of stream");
}
if ((off | len | (off + len) | (b.length - (off + len))) < 0) {
throw new IndexOutOfBoundsException();
} else if (len == 0) {
return;
}
if (!def.finished()) {
def.setInput(b, off, len);
while (!def.needsInput()) {
deflate();
}
}
}
}

class GZIPOutputStream extends DeflaterOutputStream {
public synchronized void write(byte[] buf, int off, int len)
throws IOException
{
super.write(buf, off, len);
crc.update(buf, off, len);
}
}

class Deflater {
private native int deflateBytes(long addr, byte[] b, int off, int len, int 
flush);
}

class CRC32 {
public void update(byte[] b, int off, int len) {
if (b == null) {
throw new NullPointerException();
}
if (off < 0 || len < 0 || off > b.length - len) {
throw new ArrayIndexOutOfBoundsException();
}
crc = updateBytes(crc, b, off, len);
}

private native static int updateBytes(int crc, byte[] b, int off, int len);
}
{code}
For each data byte, the code above has to allocate 1 single byte array, acquire 
several locks, call two native JNI methods (Deflater.deflateBytes and 
CRC32.updateBytes). 

> Introduce buffer between Java data stream and gzip stream
> -
>
> Key: SPARK-23347
> URL: https://issues.apache.org/jira/browse/SPARK-23347
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Ted Yu
>Priority: Minor
>
> Currently GZIPOutputStream is used directly around ByteArrayOutputStream 
> e.g. from KVStoreSerializer :
> {code}
>   ByteArrayOutputStream bytes = new ByteArrayOutputStream();
>   GZIPOutputStream out = new GZIPOutputStream(bytes);
> {code}
> This seems inefficient.
> GZIPOutputStream does not implement the write(byte) method. It only provides 
> a write(byte[], offset, len) method, which calls the corresponding JNI zlib 
> function.
> BufferedOutputStream can be introduced wrapping GZIPOutputStream for better 
> performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23347) Introduce buffer between Java data stream and gzip stream

2018-02-06 Thread Ted Yu (JIRA)
Ted Yu created SPARK-23347:
--

 Summary: Introduce buffer between Java data stream and gzip stream
 Key: SPARK-23347
 URL: https://issues.apache.org/jira/browse/SPARK-23347
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Ted Yu


Currently GZIPOutputStream is used directly around ByteArrayOutputStream 
e.g. from KVStoreSerializer :
{code}
  ByteArrayOutputStream bytes = new ByteArrayOutputStream();
  GZIPOutputStream out = new GZIPOutputStream(bytes);
{code}
This seems inefficient.
GZIPOutputStream does not implement the write(byte) method. It only provides a 
write(byte[], offset, len) method, which calls the corresponding JNI zlib 
function.

BufferedOutputStream can be introduced wrapping GZIPOutputStream for better 
performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual

2016-10-27 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10561:
---
Labels: tool  (was: )

> Provide tooling for auto-generating Spark SQL reference manual
> --
>
> Key: SPARK-10561
> URL: https://issues.apache.org/jira/browse/SPARK-10561
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Reporter: Ted Yu
>  Labels: tool
>
> Here is the discussion thread:
> http://search-hadoop.com/m/q3RTtcD20F1o62xE
> Richard Hillegas made the following suggestion:
> A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
> to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
> support provided by the Scala language. I am new to programming in Scala, so 
> I don't know whether the Scala ecosystem provides any good tools for 
> reverse-engineering a BNF from a class which extends 
> scala.util.parsing.combinator.syntactical.StandardTokenParsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10868) monotonicallyIncreasingId() supports offset for indexing

2016-08-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425607#comment-15425607
 ] 

Ted Yu commented on SPARK-10868:


bq. this makes sense

What was the reasoning behind the initial assessment ?

> monotonicallyIncreasingId() supports offset for indexing
> 
>
> Key: SPARK-10868
> URL: https://issues.apache.org/jira/browse/SPARK-10868
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Martin Senne
>
> With SPARK-7135 and https://github.com/apache/spark/pull/5709 
> `monotonicallyIncreasingID()` allows to create an index column with unique 
> ids. The indexing always starts at 0 (no offset).
> *Feature wish*
> Having a parameter `offset`, such that the function can be used as
> {{monotonicallyIncreasingID( offset )}}
> and indexing _starts at *offset* instead of 0_.
> *Use-case* 
> Add rows to a DataFrame that is already written to a DB (via 
> _.write.jdbc(...)_).
> In detail:
> - A DataFrame *A* (containing an ID column) and having indices from 0 to 199 
> in that column is existent in DB.
> - New rows need to be added to *A*. This included
> -- Creating a DataFrame *A'* with new rows, but without id column
> -- Add the index column to *A'* - this time starting at *200*, as there are 
> already entries with id's from 0 to 199 (*here, monotonicallyInreasingID( 200 
> ) is required.*)
> -- union *A* and *A'*
> -- store into DB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10868) monotonicallyIncreasingId() supports offset for indexing

2016-08-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414164#comment-15414164
 ] 

Ted Yu commented on SPARK-10868:


I can pick up this one if Martin hasn't started working on it.

> monotonicallyIncreasingId() supports offset for indexing
> 
>
> Key: SPARK-10868
> URL: https://issues.apache.org/jira/browse/SPARK-10868
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Martin Senne
>
> With SPARK-7135 and https://github.com/apache/spark/pull/5709 
> `monotonicallyIncreasingID()` allows to create an index column with unique 
> ids. The indexing always starts at 0 (no offset).
> *Feature wish*
> Having a parameter `offset`, such that the function can be used as
> {{monotonicallyIncreasingID( offset )}}
> and indexing _starts at *offset* instead of 0_.
> *Use-case* 
> Add rows to a DataFrame that is already written to a DB (via 
> _.write.jdbc(...)_).
> In detail:
> - A DataFrame *A* (containing an ID column) and having indices from 0 to 199 
> in that column is existent in DB.
> - New rows need to be added to *A*. This included
> -- Creating a DataFrame *A'* with new rows, but without id column
> -- Add the index column to *A'* - this time starting at *200*, as there are 
> already entries with id's from 0 to 199 (*here, monotonicallyInreasingID( 200 
> ) is required.*)
> -- union *A* and *A'*
> -- store into DB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15273) YarnSparkHadoopUtil#getOutOfMemoryErrorArgument should respect OnOutOfMemoryError parameter given by user

2016-05-11 Thread Ted Yu (JIRA)
Ted Yu created SPARK-15273:
--

 Summary: YarnSparkHadoopUtil#getOutOfMemoryErrorArgument should 
respect OnOutOfMemoryError parameter given by user
 Key: SPARK-15273
 URL: https://issues.apache.org/jira/browse/SPARK-15273
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


As Nirav reported in this thread:
http://search-hadoop.com/m/q3RTtdF3yNLMd7u

YarnSparkHadoopUtil#getOutOfMemoryErrorArgument previously specified 'kill %p' 
unconditionally.
We should respect the parameter given by user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15003) Use ConcurrentHashMap in place of HashMap for NewAccumulator.originals

2016-04-29 Thread Ted Yu (JIRA)
Ted Yu created SPARK-15003:
--

 Summary: Use ConcurrentHashMap in place of HashMap for 
NewAccumulator.originals
 Key: SPARK-15003
 URL: https://issues.apache.org/jira/browse/SPARK-15003
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Ted Yu


This issue proposes to use ConcurrentHashMap in place of HashMap for 
NewAccumulator.originals

This should result in better performance.

Also removed AccumulatorContext#accumIds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14448) Improvements to ColumnVector

2016-04-06 Thread Ted Yu (JIRA)
Ted Yu created SPARK-14448:
--

 Summary: Improvements to ColumnVector
 Key: SPARK-14448
 URL: https://issues.apache.org/jira/browse/SPARK-14448
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Ted Yu
Priority: Minor


In this issue, two changes are proposed for ColumnVector :

1. ColumnVector should be declared as implementing AutoCloseable - it already 
has close() method

2. In OnHeapColumnVector#reserveInternal(), we only need to allocate new array 
when existing array is null or the length of existing array is shorter than the 
newCapacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13663) Upgrade Snappy Java to 1.1.2.1

2016-03-03 Thread Ted Yu (JIRA)
Ted Yu created SPARK-13663:
--

 Summary: Upgrade Snappy Java to 1.1.2.1
 Key: SPARK-13663
 URL: https://issues.apache.org/jira/browse/SPARK-13663
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Ted Yu
Priority: Minor


The JVM memory leaky problem reported in 
https://github.com/xerial/snappy-java/issues/131 has been resolved.

1.1.2.1 was released on Jan 22nd.

We should upgrade to this release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual

2016-02-21 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10561:
---
Description: 
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.

  was:
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:


A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.


> Provide tooling for auto-generating Spark SQL reference manual
> --
>
> Key: SPARK-10561
> URL: https://issues.apache.org/jira/browse/SPARK-10561
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Reporter: Ted Yu
>
> Here is the discussion thread:
> http://search-hadoop.com/m/q3RTtcD20F1o62xE
> Richard Hillegas made the following suggestion:
> A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
> to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
> support provided by the Scala language. I am new to programming in Scala, so 
> I don't know whether the Scala ecosystem provides any good tools for 
> reverse-engineering a BNF from a class which extends 
> scala.util.parsing.combinator.syntactical.StandardTokenParsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13180) Protect against SessionState being null when accessing HiveClientImpl#conf

2016-02-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137674#comment-15137674
 ] 

Ted Yu commented on SPARK-13180:


I wonder if we should provide better error message when NPE happens - the cause 
may be mixed dependencies. See last response on the thread.

> Protect against SessionState being null when accessing HiveClientImpl#conf
> --
>
> Key: SPARK-13180
> URL: https://issues.apache.org/jira/browse/SPARK-13180
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-13180-util.patch
>
>
> See this thread http://search-hadoop.com/m/q3RTtFoTDi2HVCrM1
> {code}
> java.lang.NullPointerException
> at 
> org.apache.spark.sql.hive.client.ClientWrapper.conf(ClientWrapper.scala:205)
> at 
> org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552)
> at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551)
> at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:538)
> at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:537)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:537)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
> at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
> at org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:457)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:457)
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:456)
> at org.apache.spark.sql.hive.HiveContext$$anon$3.(HiveContext.scala:473)
> at 
> org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:473)
> at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:472)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:133)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
> at 
> org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:442)
> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:223)
> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual

2016-02-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10561:
---
Description: 
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:


A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.

  was:
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.


> Provide tooling for auto-generating Spark SQL reference manual
> --
>
> Key: SPARK-10561
> URL: https://issues.apache.org/jira/browse/SPARK-10561
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Reporter: Ted Yu
>
> Here is the discussion thread:
> http://search-hadoop.com/m/q3RTtcD20F1o62xE
> Richard Hillegas made the following suggestion:
> A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
> to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
> support provided by the Scala language. I am new to programming in Scala, so 
> I don't know whether the Scala ecosystem provides any good tools for 
> reverse-engineering a BNF from a class which extends 
> scala.util.parsing.combinator.syntactical.StandardTokenParsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13120) Shade protobuf-java

2016-02-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136383#comment-15136383
 ] 

Ted Yu commented on SPARK-13120:


I don't think this JIRA is limited to the scenario from the thread initially 
cited.

I have dropped that from description.

> Shade protobuf-java
> ---
>
> Key: SPARK-13120
> URL: https://issues.apache.org/jira/browse/SPARK-13120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>
> https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8
> PB2 and PB3 are wire compatible, but, protobuf-java is not compatible so 
> dependency will be a problem.
> Shading protobuf-java would provide better experience for downstream projects.
> This issue shades com.google.protobuf:protobuf-java as 
> org.spark-project.protobuf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13120) Shade protobuf-java

2016-02-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-13120:
---
Description: 
https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8
PB2 and PB3 are wire compatible, but, protobuf-java is not compatible so 
dependency will be a problem.
Shading protobuf-java would provide better experience for downstream projects.

This issue shades com.google.protobuf:protobuf-java as 
org.spark-project.protobuf

  was:
See this thread for background information:

http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis

This issue shades com.google.protobuf:protobuf-java as 
org.spark-project.protobuf


> Shade protobuf-java
> ---
>
> Key: SPARK-13120
> URL: https://issues.apache.org/jira/browse/SPARK-13120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>
> https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8
> PB2 and PB3 are wire compatible, but, protobuf-java is not compatible so 
> dependency will be a problem.
> Shading protobuf-java would provide better experience for downstream projects.
> This issue shades com.google.protobuf:protobuf-java as 
> org.spark-project.protobuf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13120) Shade protobuf-java

2016-02-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136320#comment-15136320
 ] 

Ted Yu commented on SPARK-13120:


I do see the advantage of shading protobuf-java which would make user 
experience better where wire-compatibility across different protobuf versions 
is provided.

> Shade protobuf-java
> ---
>
> Key: SPARK-13120
> URL: https://issues.apache.org/jira/browse/SPARK-13120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>
> See this thread for background information:
> http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis
> This issue shades com.google.protobuf:protobuf-java as 
> org.spark-project.protobuf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated

2016-02-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136006#comment-15136006
 ] 

Ted Yu commented on SPARK-13171:


https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/134/
 is green.

Let's keep an eye on future builds.

> Update promise & future to Promise and Future as the old ones are deprecated
> 
>
> Key: SPARK-13171
> URL: https://issues.apache.org/jira/browse/SPARK-13171
> Project: Spark
>  Issue Type: Sub-task
>Reporter: holdenk
>Assignee: Jakob Odersky
>Priority: Trivial
> Fix For: 2.0.0
>
>
> We use the promise and future functions on the concurrent object, both of 
> which have been deprecated in 2.11 . The full traits are present in Scala 
> 2.10 as well so this should be a safe migration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated

2016-02-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136004#comment-15136004
 ] 

Ted Yu commented on SPARK-13171:


No, I have not.

bq. Could it be that Hive does some funky reflection

Could be.

> Update promise & future to Promise and Future as the old ones are deprecated
> 
>
> Key: SPARK-13171
> URL: https://issues.apache.org/jira/browse/SPARK-13171
> Project: Spark
>  Issue Type: Sub-task
>Reporter: holdenk
>Assignee: Jakob Odersky
>Priority: Trivial
> Fix For: 2.0.0
>
>
> We use the promise and future functions on the concurrent object, both of 
> which have been deprecated in 2.11 . The full traits are present in Scala 
> 2.10 as well so this should be a safe migration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated

2016-02-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135972#comment-15135972
 ] 

Ted Yu commented on SPARK-13171:


The above Jenkins jobs were running Scala 2.11

> Update promise & future to Promise and Future as the old ones are deprecated
> 
>
> Key: SPARK-13171
> URL: https://issues.apache.org/jira/browse/SPARK-13171
> Project: Spark
>  Issue Type: Sub-task
>Reporter: holdenk
>Assignee: Jakob Odersky
>Priority: Trivial
> Fix For: 2.0.0
>
>
> We use the promise and future functions on the concurrent object, both of 
> which have been deprecated in 2.11 . The full traits are present in Scala 
> 2.10 as well so this should be a safe migration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated

2016-02-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135784#comment-15135784
 ] 

Ted Yu commented on SPARK-13171:


Turns out not that trivial.

For build against hadoop 2.4 , some test timed out:

https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.4/lastCompletedBuild/testReport/org.apache.spark.sql.hive/HiveSparkSubmitSuite/SPARK_8020__set_sql_conf_in_spark_conf/

For build against hadoop 2.7, Jenkins job timed out.

> Update promise & future to Promise and Future as the old ones are deprecated
> 
>
> Key: SPARK-13171
> URL: https://issues.apache.org/jira/browse/SPARK-13171
> Project: Spark
>  Issue Type: Sub-task
>Reporter: holdenk
>Assignee: Jakob Odersky
>Priority: Trivial
> Fix For: 2.0.0
>
>
> We use the promise and future functions on the concurrent object, both of 
> which have been deprecated in 2.11 . The full traits are present in Scala 
> 2.10 as well so this should be a safe migration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13180) Protect against SessionState being null when accessing HiveClientImpl#conf

2016-02-05 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-13180:
---
Attachment: spark-13180-util.patch

Patch with HiveConfUtil.scala

For record.

> Protect against SessionState being null when accessing HiveClientImpl#conf
> --
>
> Key: SPARK-13180
> URL: https://issues.apache.org/jira/browse/SPARK-13180
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-13180-util.patch
>
>
> See this thread http://search-hadoop.com/m/q3RTtFoTDi2HVCrM1
> {code}
> java.lang.NullPointerException
> at 
> org.apache.spark.sql.hive.client.ClientWrapper.conf(ClientWrapper.scala:205)
> at 
> org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552)
> at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551)
> at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:538)
> at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:537)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:537)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
> at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
> at org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:457)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:457)
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:456)
> at org.apache.spark.sql.hive.HiveContext$$anon$3.(HiveContext.scala:473)
> at 
> org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:473)
> at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:472)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:133)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
> at 
> org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:442)
> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:223)
> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13199) Upgrade apache httpclient version to the latest 4.5 for security

2016-02-05 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134022#comment-15134022
 ] 

Ted Yu commented on SPARK-13199:


This task should be done when there is a Hadoop release that upgrades 
httpclient to 4.5+

I logged this as Bug since it deals with CVEs. 
See HADOOP-12767. 

> Upgrade apache httpclient version to the latest 4.5 for security
> 
>
> Key: SPARK-13199
> URL: https://issues.apache.org/jira/browse/SPARK-13199
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>Priority: Minor
>
> Various SSL security fixes are needed. 
> See:  CVE-2012-6153, CVE-2011-4461, CVE-2014-3577, CVE-2015-5262.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13204) Replace use of mutable.SynchronizedMap with ConcurrentHashMap

2016-02-05 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134006#comment-15134006
 ] 

Ted Yu commented on SPARK-13204:


Just noticed that parent JIRA was marked trivial. 

Should have done the search first. 

> Replace use of mutable.SynchronizedMap with ConcurrentHashMap
> -
>
> Key: SPARK-13204
> URL: https://issues.apache.org/jira/browse/SPARK-13204
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Ted Yu
>Priority: Trivial
>
> From 
> http://www.scala-lang.org/api/2.11.1/index.html#scala.collection.mutable.SynchronizedMap
>  :
> Synchronization via traits is deprecated as it is inherently unreliable. 
> Consider java.util.concurrent.ConcurrentHashMap as an alternative.
> This issue is to replace the use of mutable.SynchronizedMap and add 
> scalastyle rule banning mutable.SynchronizedMap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13204) Replace use of mutable.SynchronizedMap with ConcurrentHashMap

2016-02-04 Thread Ted Yu (JIRA)
Ted Yu created SPARK-13204:
--

 Summary: Replace use of mutable.SynchronizedMap with 
ConcurrentHashMap
 Key: SPARK-13204
 URL: https://issues.apache.org/jira/browse/SPARK-13204
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


>From 
>http://www.scala-lang.org/api/2.11.1/index.html#scala.collection.mutable.SynchronizedMap
> :

Synchronization via traits is deprecated as it is inherently unreliable. 
Consider java.util.concurrent.ConcurrentHashMap as an alternative.

This issue is to replace the use of mutable.SynchronizedMap and add scalastyle 
rule banning mutable.SynchronizedMap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13203) Add scalastyle rule banning use of mutable.SynchronizedBuffer

2016-02-04 Thread Ted Yu (JIRA)
Ted Yu created SPARK-13203:
--

 Summary: Add scalastyle rule banning use of 
mutable.SynchronizedBuffer 
 Key: SPARK-13203
 URL: https://issues.apache.org/jira/browse/SPARK-13203
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Ted Yu
Priority: Minor


>From SPARK-13164 :
Building with scala 2.11 results in the warning trait SynchronizedBuffer in 
package mutable is deprecated: Synchronization via traits is deprecated as it 
is inherently unreliable. Consider java.util.concurrent.ConcurrentLinkedQueue 
as an alternative.

This issue adds scalastyle rule banning the use of mutable.SynchronizedBuffer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13199) Upgrade apache httpclient version to the latest 4.5 for security

2016-02-04 Thread Ted Yu (JIRA)
Ted Yu created SPARK-13199:
--

 Summary: Upgrade apache httpclient version to the latest 4.5 for 
security
 Key: SPARK-13199
 URL: https://issues.apache.org/jira/browse/SPARK-13199
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Ted Yu


Various SSL security fixes are needed. 
See:  CVE-2012-6153, CVE-2011-4461, CVE-2014-3577, CVE-2015-5262.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13180) Protect against SessionState being null when accessing HiveClientImpl#conf

2016-02-03 Thread Ted Yu (JIRA)
Ted Yu created SPARK-13180:
--

 Summary: Protect against SessionState being null when accessing 
HiveClientImpl#conf
 Key: SPARK-13180
 URL: https://issues.apache.org/jira/browse/SPARK-13180
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


See this thread http://search-hadoop.com/m/q3RTtFoTDi2HVCrM1
{code}
java.lang.NullPointerException
at org.apache.spark.sql.hive.client.ClientWrapper.conf(ClientWrapper.scala:205)
at 
org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552)
at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551)
at 
org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:538)
at 
org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:537)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:537)
at 
org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
at org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:457)
at 
org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:457)
at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:456)
at org.apache.spark.sql.hive.HiveContext$$anon$3.(HiveContext.scala:473)
at 
org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:473)
at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:472)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:133)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:442)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13120) Shade protobuf-java

2016-02-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128759#comment-15128759
 ] 

Ted Yu commented on SPARK-13120:


https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8

PB2 and PB3 are wire compatible, but, protobuf-java is not compatible so 
dependency will be a problem.
Shading protobuf-java would provide better experience for downstream projects.

> Shade protobuf-java
> ---
>
> Key: SPARK-13120
> URL: https://issues.apache.org/jira/browse/SPARK-13120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>
> See this thread for background information:
> http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis
> This issue shades com.google.protobuf:protobuf-java as 
> org.spark-project.protobuf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13084) Utilize @SerialVersionUID to avoid local class incompatibility

2016-02-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-13084.

Resolution: Won't Fix

> Utilize @SerialVersionUID to avoid local class incompatibility
> --
>
> Key: SPARK-13084
> URL: https://issues.apache.org/jira/browse/SPARK-13084
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Ted Yu
>
> Here is related thread:
> http://search-hadoop.com/m/q3RTtSjjdT1BJ4Jr/local+class+incompatible&subj=local+class+incompatible+stream+classdesc+serialVersionUID
> RDD extends Serializable but doesn't have @SerialVersionUID() annotation.
> Adding @SerialVersionUID would overcome local class incompatibility across 
> minor releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13120) Shade protobuf-java

2016-02-01 Thread Ted Yu (JIRA)
Ted Yu created SPARK-13120:
--

 Summary: Shade protobuf-java
 Key: SPARK-13120
 URL: https://issues.apache.org/jira/browse/SPARK-13120
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Ted Yu


See this thread for background information:

http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis

This issue shades com.google.protobuf:protobuf-java as 
org.spark-project.protobuf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13084) Utilize @SerialVersionUID to avoid local class incompatibility

2016-01-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-13084:
---
Component/s: Spark Core

> Utilize @SerialVersionUID to avoid local class incompatibility
> --
>
> Key: SPARK-13084
> URL: https://issues.apache.org/jira/browse/SPARK-13084
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Ted Yu
>
> Here is related thread:
> http://search-hadoop.com/m/q3RTtSjjdT1BJ4Jr/local+class+incompatible&subj=local+class+incompatible+stream+classdesc+serialVersionUID
> RDD extends Serializable but doesn't have @SerialVersionUID() annotation.
> Adding @SerialVersionUID would overcome local class incompatibility across 
> minor releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13084) Utilize @SerialVersionUID to avoid local class incompatibility

2016-01-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-13084:
---
Description: 
Here is related thread:

http://search-hadoop.com/m/q3RTtSjjdT1BJ4Jr/local+class+incompatible&subj=local+class+incompatible+stream+classdesc+serialVersionUID

RDD extends Serializable but doesn't have @SerialVersionUID() annotation.

Adding @SerialVersionUID would overcome local class incompatibility across 
minor releases.


  was:
Here is related thread:

http://search-hadoop.com/m/q3RTtSjjdT1BJ4Jr/local+class+incompatible&subj=local+class+incompatible+stream+classdesc+serialVersionUID

RDD extends Serializable but doesn't have @SerialVersionUID() annotation.

Adding @SerialVersionUID would overcome local class incompatibility.



> Utilize @SerialVersionUID to avoid local class incompatibility
> --
>
> Key: SPARK-13084
> URL: https://issues.apache.org/jira/browse/SPARK-13084
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>
> Here is related thread:
> http://search-hadoop.com/m/q3RTtSjjdT1BJ4Jr/local+class+incompatible&subj=local+class+incompatible+stream+classdesc+serialVersionUID
> RDD extends Serializable but doesn't have @SerialVersionUID() annotation.
> Adding @SerialVersionUID would overcome local class incompatibility across 
> minor releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13084) Utilize @SerialVersionUID to avoid local class incompatibility

2016-01-29 Thread Ted Yu (JIRA)
Ted Yu created SPARK-13084:
--

 Summary: Utilize @SerialVersionUID to avoid local class 
incompatibility
 Key: SPARK-13084
 URL: https://issues.apache.org/jira/browse/SPARK-13084
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


Here is related thread:

http://search-hadoop.com/m/q3RTtSjjdT1BJ4Jr/local+class+incompatible&subj=local+class+incompatible+stream+classdesc+serialVersionUID

RDD extends Serializable but doesn't have @SerialVersionUID() annotation.

Adding @SerialVersionUID would overcome local class incompatibility.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13022) Shade jackson core

2016-01-27 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-13022.

Resolution: Won't Fix

> Shade jackson core
> --
>
> Key: SPARK-13022
> URL: https://issues.apache.org/jira/browse/SPARK-13022
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Reporter: Ted Yu
>Priority: Minor
>
> See the thread for background information:
> http://search-hadoop.com/m/q3RTtYuufRO7LLG
> This issue proposes to shade com.fasterxml.jackson.core as 
> org.spark-project.jackson



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13022) Shade jackson core

2016-01-26 Thread Ted Yu (JIRA)
Ted Yu created SPARK-13022:
--

 Summary: Shade jackson core
 Key: SPARK-13022
 URL: https://issues.apache.org/jira/browse/SPARK-13022
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Reporter: Ted Yu
Priority: Minor


See the thread for background information:
http://search-hadoop.com/m/q3RTtYuufRO7LLG

This issue proposes to shade com.fasterxml.jackson.core as 
org.spark-project.jackson



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-12778:
---
Description: 
In Platform.java, methods of Java Unsafe are called directly without 
considering endianness.

In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported data 
corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
environment.

Platform.java should take endianness into account.

Below is a copy of Adam's report:

I've been experimenting with DataFrame operations in a mixed endian environment 
- a big endian master with little endian workers. With tungsten enabled I'm 
encountering data corruption issues. 

For example, with this simple test code: 
{code}
import org.apache.spark.SparkContext
import org.apache.spark._
import org.apache.spark.sql.SQLContext

object SimpleSQL {
  def main(args: Array[String]): Unit = {
if (args.length != 1) {
  println("Not enough args, you need to specify the master url")
}
val masterURL = args(0)
println("Setting up Spark context at: " + masterURL)
val sparkConf = new SparkConf
val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf)

println("Performing SQL tests")

val sqlContext = new SQLContext(sc)
println("SQL context set up")
val df = sqlContext.read.json("/tmp/people.json")
df.show()
println("Selecting everyone's age and adding one to it")
df.select(df("name"), df("age") + 1).show()
println("Showing all people over the age of 21")
df.filter(df("age") > 21).show()
println("Counting people by age")
df.groupBy("age").count().show()
  }
} 
{code}
Instead of getting 
{code}
++-+
| age|count|
++-+
|null|1|
|  19|1|
|  30|1|
++-+ 
{code}
I get the following with my mixed endian set up: 
{code}
+---+-+
|age|count|
+---+-+
|   null|1|
|1369094286720630784|72057594037927936|
| 30|1|
+---+-+ 
{code}
and on another run: 
{code}
+---+-+
|age|count|
+---+-+
|  0|72057594037927936|
| 19|1| 
{code}

  was:
In Platform.java, methods of Java Unsafe are called directly without 
considering endianness.

In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported data 
corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
environment.

Platform.java should take endianness into account.

Below is a copy of Adam's report:

I've been experimenting with DataFrame operations in a mixed endian environment 
- a big endian master with little endian workers. With tungsten enabled I'm 
encountering data corruption issues. 

For example, with this simple test code: 
{code}
import org.apache.spark.SparkContext
import org.apache.spark._
import org.apache.spark.sql.SQLContext

object SimpleSQL {
  def main(args: Array[String]): Unit = {
if (args.length != 1) {
  println("Not enough args, you need to specify the master url")
}
val masterURL = args(0)
println("Setting up Spark context at: " + masterURL)
val sparkConf = new SparkConf
val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf)

println("Performing SQL tests")

val sqlContext = new SQLContext(sc)
println("SQL context set up")
val df = sqlContext.read.json("/tmp/people.json")
df.show()
println("Selecting everyone's age and adding one to it")
df.select(df("name"), df("age") + 1).show()
println("Showing all people over the age of 21")
df.filter(df("age") > 21).show()
println("Counting people by age")
df.groupBy("age").count().show()
  }
} 
{code}
Instead of getting 

++-+
| age|count|
++-+
|null|1|
|  19|1|
|  30|1|
++-+ 

I get the following with my mixed endian set up: 

+---+-+
|age|count|
+---+-+
|   null|1|
|1369094286720630784|72057594037927936|
| 30|1|
+---+-+ 

and on another run: 

+---+-+
|age|count|
+---+-+
|  0|72057594037927936|
| 19|1| 



> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Reporter: Ted Yu
>
> In Platfor

[jira] [Updated] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-12778:
---
Description: 
In Platform.java, methods of Java Unsafe are called directly without 
considering endianness.

In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported data 
corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
environment.

Platform.java should take endianness into account.

Below is a copy of Adam's report:

I've been experimenting with DataFrame operations in a mixed endian environment 
- a big endian master with little endian workers. With tungsten enabled I'm 
encountering data corruption issues. 

For example, with this simple test code: 
{code}
import org.apache.spark.SparkContext
import org.apache.spark._
import org.apache.spark.sql.SQLContext

object SimpleSQL {
  def main(args: Array[String]): Unit = {
if (args.length != 1) {
  println("Not enough args, you need to specify the master url")
}
val masterURL = args(0)
println("Setting up Spark context at: " + masterURL)
val sparkConf = new SparkConf
val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf)

println("Performing SQL tests")

val sqlContext = new SQLContext(sc)
println("SQL context set up")
val df = sqlContext.read.json("/tmp/people.json")
df.show()
println("Selecting everyone's age and adding one to it")
df.select(df("name"), df("age") + 1).show()
println("Showing all people over the age of 21")
df.filter(df("age") > 21).show()
println("Counting people by age")
df.groupBy("age").count().show()
  }
} 
{code}
Instead of getting 

++-+
| age|count|
++-+
|null|1|
|  19|1|
|  30|1|
++-+ 

I get the following with my mixed endian set up: 

+---+-+
|age|count|
+---+-+
|   null|1|
|1369094286720630784|72057594037927936|
| 30|1|
+---+-+ 

and on another run: 

+---+-+
|age|count|
+---+-+
|  0|72057594037927936|
| 19|1| 


  was:
In Platform.java, methods of Java Unsafe are called directly without 
considering endianness.

In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported data 
corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
environment.

Platform.java should take endianness into account.


> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Reporter: Ted Yu
>
> In Platform.java, methods of Java Unsafe are called directly without 
> considering endianness.
> In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported 
> data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
> environment.
> Platform.java should take endianness into account.
> Below is a copy of Adam's report:
> I've been experimenting with DataFrame operations in a mixed endian 
> environment - a big endian master with little endian workers. With tungsten 
> enabled I'm encountering data corruption issues. 
> For example, with this simple test code: 
> {code}
> import org.apache.spark.SparkContext
> import org.apache.spark._
> import org.apache.spark.sql.SQLContext
> object SimpleSQL {
>   def main(args: Array[String]): Unit = {
> if (args.length != 1) {
>   println("Not enough args, you need to specify the master url")
> }
> val masterURL = args(0)
> println("Setting up Spark context at: " + masterURL)
> val sparkConf = new SparkConf
> val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf)
> println("Performing SQL tests")
> val sqlContext = new SQLContext(sc)
> println("SQL context set up")
> val df = sqlContext.read.json("/tmp/people.json")
> df.show()
> println("Selecting everyone's age and adding one to it")
> df.select(df("name"), df("age") + 1).show()
> println("Showing all people over the age of 21")
> df.filter(df("age") > 21).show()
> println("Counting people by age")
> df.groupBy("age").count().show()
>   }
> } 
> {code}
> Instead of getting 
> ++-+
> | age|count|
> ++-+
> |null|1|
> |  19|1|
> |  30|1|
> ++-+ 
> I get the following with my mixed endian set up: 
> +---+-+
> |age|count|
> +---+-+
> |   null|

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094090#comment-15094090
 ] 

Ted Yu commented on SPARK-12778:


I have seen SPARK-12555 but it didn't identify where in the codebase the 
problem is.

I checked unsafe/src/main/java/org/apache/spark/unsafe/Platform.java in master 
branch where endianness is not taken care of.

> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Reporter: Ted Yu
>
> In Platform.java, methods of Java Unsafe are called directly without 
> considering endianness.
> In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported 
> data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
> environment.
> Platform.java should take endianness into account.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094078#comment-15094078
 ] 

Ted Yu edited comment on SPARK-12778 at 1/12/16 3:33 PM:
-

I did perform a search in JIRA for Unsafe related Spark JIRAs but didn't find 
any unresolved one.

I checked master branch code base before logging this JIRA.

When the above thread is indexed by search-hadoop, I will add a link.


was (Author: yuzhih...@gmail.com):
I did perform a search in JIRA for Unsafe related Spark JIRAs but didn't find 
any unresolved one.

I checked master branch code base before logging this JIRA.

> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Reporter: Ted Yu
>
> In Platform.java, methods of Java Unsafe are called directly without 
> considering endianness.
> In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported 
> data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
> environment.
> Platform.java should take endianness into account.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-12778:
---
Component/s: Input/Output

> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Reporter: Ted Yu
>
> In Platform.java, methods of Java Unsafe are called directly without 
> considering endianness.
> In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported 
> data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
> environment.
> Platform.java should take endianness into account.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094078#comment-15094078
 ] 

Ted Yu commented on SPARK-12778:


I did perform a search in JIRA for Unsafe related Spark JIRAs but didn't find 
any unresolved one.

I checked master branch code base before logging this JIRA.

> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>
> In Platform.java, methods of Java Unsafe are called directly without 
> considering endianness.
> In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported 
> data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
> environment.
> Platform.java should take endianness into account.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Ted Yu (JIRA)
Ted Yu created SPARK-12778:
--

 Summary: Use of Java Unsafe should take endianness into account
 Key: SPARK-12778
 URL: https://issues.apache.org/jira/browse/SPARK-12778
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


In Platform.java, methods of Java Unsafe are called directly without 
considering endianness.

In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported data 
corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
environment.

Platform.java should take endianness into account.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable

2016-01-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-4066.
---
Resolution: Later

Haven't heard of much feedback so far.

Resolving for now.

> Make whether maven builds fails on scalastyle violation configurable
> 
>
> Key: SPARK-4066
> URL: https://issues.apache.org/jira/browse/SPARK-4066
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>Priority: Minor
>  Labels: style
> Attachments: spark-4066-v1.txt
>
>
> Here is the thread Koert started:
> http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit
> It would be flexible if whether maven build fails due to scalastyle violation 
> configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12527) Add private val after @transient for kinesis-asl module

2015-12-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-12527.

Resolution: Duplicate

Dup of SPARK-1252

> Add private val after @transient for kinesis-asl module
> ---
>
> Key: SPARK-12527
> URL: https://issues.apache.org/jira/browse/SPARK-12527
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Apache Spark
>
> In SBT build using Scala 2.11, the following warnings were reported which 
> resulted in build failure 
> (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-branch-1.6-COMPILE-SBT-SCALA-2.11/3/consoleFull):
> {code}
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33:
>  no valid targets for annotation on  value _ssc - it is discarded unused. You 
> may specify targets with meta-annotations, e.g. @(transient @param)
> [error] [warn] @transient _ssc: StreamingContext,
> [error] [warn]
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73:
>  no valid targets for annotation   on value sc - it is discarded unused. You 
> may specify targets with meta-annotations, e.g. @(transient @param)
> [error] [warn] @transient sc: SparkContext,
> [error] [warn]
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76:
>  no valid targets for annotation   on value blockIds - it is discarded 
> unused. You may specify targets with meta-annotations, e.g. @(transient 
> @param)
> [error] [warn] @transient blockIds: Array[BlockId],
> [error] [warn]
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:78:
>  no valid targets for annotation   on value isBlockIdValid - it is discarded 
> unused. You may specify targets with meta-annotations, e.g. @(transient 
> @param)
> [error] [warn] @transient isBlockIdValid: Array[Boolean] = Array.empty,
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-12527) Add private val after @transient for kinesis-asl module

2015-12-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-12527:
---
Comment: was deleted

(was: https://github.com/apache/spark/pull/10482)

> Add private val after @transient for kinesis-asl module
> ---
>
> Key: SPARK-12527
> URL: https://issues.apache.org/jira/browse/SPARK-12527
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>
> In SBT build using Scala 2.11, the following warnings were reported which 
> resulted in build failure 
> (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-branch-1.6-COMPILE-SBT-SCALA-2.11/3/consoleFull):
> {code}
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33:
>  no valid targets for annotation on  value _ssc - it is discarded unused. You 
> may specify targets with meta-annotations, e.g. @(transient @param)
> [error] [warn] @transient _ssc: StreamingContext,
> [error] [warn]
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73:
>  no valid targets for annotation   on value sc - it is discarded unused. You 
> may specify targets with meta-annotations, e.g. @(transient @param)
> [error] [warn] @transient sc: SparkContext,
> [error] [warn]
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76:
>  no valid targets for annotation   on value blockIds - it is discarded 
> unused. You may specify targets with meta-annotations, e.g. @(transient 
> @param)
> [error] [warn] @transient blockIds: Array[BlockId],
> [error] [warn]
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:78:
>  no valid targets for annotation   on value isBlockIdValid - it is discarded 
> unused. You may specify targets with meta-annotations, e.g. @(transient 
> @param)
> [error] [warn] @transient isBlockIdValid: Array[Boolean] = Array.empty,
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12527) Add private val after @transient for kinesis-asl module

2015-12-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-12527:
---
Description: 
In SBT build using Scala 2.11, the following warnings were reported which 
resulted in build failure 
(https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-branch-1.6-COMPILE-SBT-SCALA-2.11/3/consoleFull):
{code}
[error] [warn] 
/dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33:
 no valid targets for annotation on  value _ssc - it is discarded unused. You 
may specify targets with meta-annotations, e.g. @(transient @param)
[error] [warn] @transient _ssc: StreamingContext,
[error] [warn]
[error] [warn] 
/dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73:
 no valid targets for annotation   on value sc - it is discarded unused. You 
may specify targets with meta-annotations, e.g. @(transient @param)
[error] [warn] @transient sc: SparkContext,
[error] [warn]
[error] [warn] 
/dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76:
 no valid targets for annotation   on value blockIds - it is discarded unused. 
You may specify targets with meta-annotations, e.g. @(transient @param)
[error] [warn] @transient blockIds: Array[BlockId],
[error] [warn]
[error] [warn] 
/dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:78:
 no valid targets for annotation   on value isBlockIdValid - it is discarded 
unused. You may specify targets with meta-annotations, e.g. @(transient @param)
[error] [warn] @transient isBlockIdValid: Array[Boolean] = Array.empty,
{code}

  was:
In SBT build using Scala 2.11, the following warnings were reported which 
resulted in build failure:
{code}
[error] [warn] 
/dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33:
 no valid targets for annotation on  value _ssc - it is discarded unused. You 
may specify targets with meta-annotations, e.g. @(transient @param)
[error] [warn] @transient _ssc: StreamingContext,
[error] [warn]
[error] [warn] 
/dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73:
 no valid targets for annotation   on value sc - it is discarded unused. You 
may specify targets with meta-annotations, e.g. @(transient @param)
[error] [warn] @transient sc: SparkContext,
[error] [warn]
[error] [warn] 
/dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76:
 no valid targets for annotation   on value blockIds - it is discarded unused. 
You may specify targets with meta-annotations, e.g. @(transient @param)
[error] [warn] @transient blockIds: Array[BlockId],
[error] [warn]
[error] [warn] 
/dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:78:
 no valid targets for annotation   on value isBlockIdValid - it is discarded 
unused. You may specify targets with meta-annotations, e.g. @(transient @param)
[error] [warn] @transient isBlockIdValid: Array[Boolean] = Array.empty,
{code}


> Add private val after @transient for kinesis-asl module
> ---
>
> Key: SPARK-12527
> URL: https://issues.apache.org/jira/browse/SPARK-12527
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>
> In SBT build using Scala 2.11, the following warnings were reported which 
> resulted in build failure 
> (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-branch-1.6-COMPILE-SBT-SCALA-2.11/3/consoleFull):
> {code}
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33:
>  no valid targets for annotation on  value _ssc - it is discarded unused. You 
> may specify targets with meta-annotations, e.g. @(transient @param)
> [error] [warn] @transient _ssc: StreamingContext,
> [error] [warn]
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73:
>  no valid targets for annotation   on value sc - it is discarded unused. You 
> may specify targets with meta-annotations, e.g. @(transient @param)
> [error] [warn] @transient sc: SparkContext,
> [error] [warn]
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76:
>  no valid targets for annotation   on value blockIds - it is di

[jira] [Commented] (SPARK-12527) Add private val after @transient for kinesis-asl module

2015-12-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071818#comment-15071818
 ] 

Ted Yu commented on SPARK-12527:


https://github.com/apache/spark/pull/10482

> Add private val after @transient for kinesis-asl module
> ---
>
> Key: SPARK-12527
> URL: https://issues.apache.org/jira/browse/SPARK-12527
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>
> In SBT build using Scala 2.11, the following warnings were reported which 
> resulted in build failure:
> {code}
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33:
>  no valid targets for annotation on  value _ssc - it is discarded unused. You 
> may specify targets with meta-annotations, e.g. @(transient @param)
> [error] [warn] @transient _ssc: StreamingContext,
> [error] [warn]
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73:
>  no valid targets for annotation   on value sc - it is discarded unused. You 
> may specify targets with meta-annotations, e.g. @(transient @param)
> [error] [warn] @transient sc: SparkContext,
> [error] [warn]
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76:
>  no valid targets for annotation   on value blockIds - it is discarded 
> unused. You may specify targets with meta-annotations, e.g. @(transient 
> @param)
> [error] [warn] @transient blockIds: Array[BlockId],
> [error] [warn]
> [error] [warn] 
> /dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:78:
>  no valid targets for annotation   on value isBlockIdValid - it is discarded 
> unused. You may specify targets with meta-annotations, e.g. @(transient 
> @param)
> [error] [warn] @transient isBlockIdValid: Array[Boolean] = Array.empty,
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12527) Add private val after @transient for kinesis-asl module

2015-12-25 Thread Ted Yu (JIRA)
Ted Yu created SPARK-12527:
--

 Summary: Add private val after @transient for kinesis-asl module
 Key: SPARK-12527
 URL: https://issues.apache.org/jira/browse/SPARK-12527
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


In SBT build using Scala 2.11, the following warnings were reported which 
resulted in build failure:
{code}
[error] [warn] 
/dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala:33:
 no valid targets for annotation on  value _ssc - it is discarded unused. You 
may specify targets with meta-annotations, e.g. @(transient @param)
[error] [warn] @transient _ssc: StreamingContext,
[error] [warn]
[error] [warn] 
/dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73:
 no valid targets for annotation   on value sc - it is discarded unused. You 
may specify targets with meta-annotations, e.g. @(transient @param)
[error] [warn] @transient sc: SparkContext,
[error] [warn]
[error] [warn] 
/dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:76:
 no valid targets for annotation   on value blockIds - it is discarded unused. 
You may specify targets with meta-annotations, e.g. @(transient @param)
[error] [warn] @transient blockIds: Array[BlockId],
[error] [warn]
[error] [warn] 
/dev/shm/spark-workspaces/8/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:78:
 no valid targets for annotation   on value isBlockIdValid - it is discarded 
unused. You may specify targets with meta-annotations, e.g. @(transient @param)
[error] [warn] @transient isBlockIdValid: Array[Boolean] = Array.empty,
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12365) Use ShutdownHookManager where Runtime.getRuntime.addShutdownHook() is called

2015-12-16 Thread Ted Yu (JIRA)
Ted Yu created SPARK-12365:
--

 Summary: Use ShutdownHookManager where 
Runtime.getRuntime.addShutdownHook() is called
 Key: SPARK-12365
 URL: https://issues.apache.org/jira/browse/SPARK-12365
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Ted Yu
Priority: Minor


SPARK-9886 fixed call to Runtime.getRuntime.addShutdownHook() in 
ExternalBlockStore.scala

This issue intends to address remaining usage of 
Runtime.getRuntime.addShutdownHook()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12181) Check Cached unaligned-access capability before using Unsafe

2015-12-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-12181:
---
Attachment: spark-12181.txt

> Check Cached unaligned-access capability before using Unsafe
> 
>
> Key: SPARK-12181
> URL: https://issues.apache.org/jira/browse/SPARK-12181
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Ted Yu
> Attachments: spark-12181.txt
>
>
> For MemoryMode.OFF_HEAP, Unsafe.getInt etc. are used with no restriction.
> However, the Oracle implementation uses these methods only if the class 
> variable unaligned (commented as "Cached unaligned-access capability") is 
> true, which seems to be calculated whether the architecture is i386, x86, 
> amd64, or x86_64.
> I think we should perform similar check for the use of Unsafe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12181) Check Cached unaligned-access capability before using Unsafe

2015-12-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-12181:
---
Component/s: Spark Core

> Check Cached unaligned-access capability before using Unsafe
> 
>
> Key: SPARK-12181
> URL: https://issues.apache.org/jira/browse/SPARK-12181
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Ted Yu
>
> For MemoryMode.OFF_HEAP, Unsafe.getInt etc. are used with no restriction.
> However, the Oracle implementation uses these methods only if the class 
> variable unaligned (commented as "Cached unaligned-access capability") is 
> true, which seems to be calculated whether the architecture is i386, x86, 
> amd64, or x86_64.
> I think we should perform similar check for the use of Unsafe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >