[jira] [Assigned] (SPARK-21818) MultivariateOnlineSummarizer.variance generate negative result

2017-08-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-21818:
-

Assignee: Weichen Xu

> MultivariateOnlineSummarizer.variance generate negative result
> --
>
> Key: SPARK-21818
> URL: https://issues.apache.org/jira/browse/SPARK-21818
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
> Fix For: 2.3.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Because of numerical error, MultivariateOnlineSummarizer.variance is possible 
> to generate negative variance.
> This is a serious bug because many algos in MLLib use stddev computed from 
> sqrt(variance),
> it will generate NaN and crash the whole algorithm.
> we can reproduce this bug use the following code:
> {code}
> val summarizer1 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.7)
> val summarizer2 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.4)
> val summarizer3 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.5)
> val summarizer4 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.4)
> val summarizer = summarizer1
>   .merge(summarizer2)
>   .merge(summarizer3)
>   .merge(summarizer4)
> println(summarizer.variance(0))
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21818) MultivariateOnlineSummarizer.variance generate negative result

2017-08-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-21818.
---
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 19029
[https://github.com/apache/spark/pull/19029]

> MultivariateOnlineSummarizer.variance generate negative result
> --
>
> Key: SPARK-21818
> URL: https://issues.apache.org/jira/browse/SPARK-21818
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
> Fix For: 2.3.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Because of numerical error, MultivariateOnlineSummarizer.variance is possible 
> to generate negative variance.
> This is a serious bug because many algos in MLLib use stddev computed from 
> sqrt(variance),
> it will generate NaN and crash the whole algorithm.
> we can reproduce this bug use the following code:
> {code}
> val summarizer1 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.7)
> val summarizer2 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.4)
> val summarizer3 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.5)
> val summarizer4 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.4)
> val summarizer = summarizer1
>   .merge(summarizer2)
>   .merge(summarizer3)
>   .merge(summarizer4)
> println(summarizer.variance(0))
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21255) NPE when creating encoder for enum

2017-08-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143428#comment-16143428
 ] 

Apache Spark commented on SPARK-21255:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/19066

> NPE when creating encoder for enum
> --
>
> Key: SPARK-21255
> URL: https://issues.apache.org/jira/browse/SPARK-21255
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.0
> Environment: org.apache.spark:spark-core_2.10:2.1.0
> org.apache.spark:spark-sql_2.10:2.1.0
>Reporter: Mike
>Assignee: Mike
> Fix For: 2.3.0
>
>
> When you try to create an encoder for Enum type (or bean with enum property) 
> via Encoders.bean(...), it fails with NullPointerException at TypeToken:495.
> I did a little research and it turns out, that in JavaTypeInference:126 
> following code 
> {code:java}
> val beanInfo = Introspector.getBeanInfo(typeToken.getRawType)
> val properties = beanInfo.getPropertyDescriptors.filterNot(_.getName == 
> "class")
> val fields = properties.map { property =>
>   val returnType = 
> typeToken.method(property.getReadMethod).getReturnType
>   val (dataType, nullable) = inferDataType(returnType)
>   new StructField(property.getName, dataType, nullable)
> }
> (new StructType(fields), true)
> {code}
> filters out properties named "class", because we wouldn't want to serialize 
> that. But enum types have another property of type Class named 
> "declaringClass", which we are trying to inspect recursively. Eventually we 
> try to inspect ClassLoader class, which has property "defaultAssertionStatus" 
> with no read method, which leads to NPE at TypeToken:495.
> I think adding property name "declaringClass" to filtering will resolve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21255) NPE when creating encoder for enum

2017-08-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143426#comment-16143426
 ] 

Apache Spark commented on SPARK-21255:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/19066

> NPE when creating encoder for enum
> --
>
> Key: SPARK-21255
> URL: https://issues.apache.org/jira/browse/SPARK-21255
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.0
> Environment: org.apache.spark:spark-core_2.10:2.1.0
> org.apache.spark:spark-sql_2.10:2.1.0
>Reporter: Mike
>Assignee: Mike
> Fix For: 2.3.0
>
>
> When you try to create an encoder for Enum type (or bean with enum property) 
> via Encoders.bean(...), it fails with NullPointerException at TypeToken:495.
> I did a little research and it turns out, that in JavaTypeInference:126 
> following code 
> {code:java}
> val beanInfo = Introspector.getBeanInfo(typeToken.getRawType)
> val properties = beanInfo.getPropertyDescriptors.filterNot(_.getName == 
> "class")
> val fields = properties.map { property =>
>   val returnType = 
> typeToken.method(property.getReadMethod).getReturnType
>   val (dataType, nullable) = inferDataType(returnType)
>   new StructField(property.getName, dataType, nullable)
> }
> (new StructType(fields), true)
> {code}
> filters out properties named "class", because we wouldn't want to serialize 
> that. But enum types have another property of type Class named 
> "declaringClass", which we are trying to inspect recursively. Eventually we 
> try to inspect ClassLoader class, which has property "defaultAssertionStatus" 
> with no read method, which leads to NPE at TypeToken:495.
> I think adding property name "declaringClass" to filtering will resolve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21729:


Assignee: Apache Spark

> Generic test for ProbabilisticClassifier to ensure consistent output columns
> 
>
> Key: SPARK-21729
> URL: https://issues.apache.org/jira/browse/SPARK-21729
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>
> One challenge with the ProbabilisticClassifier abstraction is that it 
> introduces different code paths for predictions depending on which output 
> columns are turned on or off: probability, rawPrediction, prediction.  We ran 
> into a bug in MLOR with this.
> This task is for adding a generic test usable in all test suites for 
> ProbabilisticClassifier types which does the following:
> * Take a dataset + Estimator
> * Fit the Estimator
> * Test prediction using the model with all combinations of output columns 
> turned on/off.
> * Make sure the output column values match, presumably by comparing vs. the 
> case with all 3 output columns turned on
> CC [~WeichenXu123] since this came up in 
> https://github.com/apache/spark/pull/17373



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21729:


Assignee: (was: Apache Spark)

> Generic test for ProbabilisticClassifier to ensure consistent output columns
> 
>
> Key: SPARK-21729
> URL: https://issues.apache.org/jira/browse/SPARK-21729
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>
> One challenge with the ProbabilisticClassifier abstraction is that it 
> introduces different code paths for predictions depending on which output 
> columns are turned on or off: probability, rawPrediction, prediction.  We ran 
> into a bug in MLOR with this.
> This task is for adding a generic test usable in all test suites for 
> ProbabilisticClassifier types which does the following:
> * Take a dataset + Estimator
> * Fit the Estimator
> * Test prediction using the model with all combinations of output columns 
> turned on/off.
> * Make sure the output column values match, presumably by comparing vs. the 
> case with all 3 output columns turned on
> CC [~WeichenXu123] since this came up in 
> https://github.com/apache/spark/pull/17373



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21729:


Assignee: (was: Apache Spark)

> Generic test for ProbabilisticClassifier to ensure consistent output columns
> 
>
> Key: SPARK-21729
> URL: https://issues.apache.org/jira/browse/SPARK-21729
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>
> One challenge with the ProbabilisticClassifier abstraction is that it 
> introduces different code paths for predictions depending on which output 
> columns are turned on or off: probability, rawPrediction, prediction.  We ran 
> into a bug in MLOR with this.
> This task is for adding a generic test usable in all test suites for 
> ProbabilisticClassifier types which does the following:
> * Take a dataset + Estimator
> * Fit the Estimator
> * Test prediction using the model with all combinations of output columns 
> turned on/off.
> * Make sure the output column values match, presumably by comparing vs. the 
> case with all 3 output columns turned on
> CC [~WeichenXu123] since this came up in 
> https://github.com/apache/spark/pull/17373



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-08-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143420#comment-16143420
 ] 

Apache Spark commented on SPARK-21729:
--

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/19065

> Generic test for ProbabilisticClassifier to ensure consistent output columns
> 
>
> Key: SPARK-21729
> URL: https://issues.apache.org/jira/browse/SPARK-21729
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>
> One challenge with the ProbabilisticClassifier abstraction is that it 
> introduces different code paths for predictions depending on which output 
> columns are turned on or off: probability, rawPrediction, prediction.  We ran 
> into a bug in MLOR with this.
> This task is for adding a generic test usable in all test suites for 
> ProbabilisticClassifier types which does the following:
> * Take a dataset + Estimator
> * Fit the Estimator
> * Test prediction using the model with all combinations of output columns 
> turned on/off.
> * Make sure the output column values match, presumably by comparing vs. the 
> case with all 3 output columns turned on
> CC [~WeichenXu123] since this came up in 
> https://github.com/apache/spark/pull/17373



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21729:


Assignee: (was: Apache Spark)

> Generic test for ProbabilisticClassifier to ensure consistent output columns
> 
>
> Key: SPARK-21729
> URL: https://issues.apache.org/jira/browse/SPARK-21729
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>
> One challenge with the ProbabilisticClassifier abstraction is that it 
> introduces different code paths for predictions depending on which output 
> columns are turned on or off: probability, rawPrediction, prediction.  We ran 
> into a bug in MLOR with this.
> This task is for adding a generic test usable in all test suites for 
> ProbabilisticClassifier types which does the following:
> * Take a dataset + Estimator
> * Fit the Estimator
> * Test prediction using the model with all combinations of output columns 
> turned on/off.
> * Make sure the output column values match, presumably by comparing vs. the 
> case with all 3 output columns turned on
> CC [~WeichenXu123] since this came up in 
> https://github.com/apache/spark/pull/17373



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21729:


Assignee: Apache Spark

> Generic test for ProbabilisticClassifier to ensure consistent output columns
> 
>
> Key: SPARK-21729
> URL: https://issues.apache.org/jira/browse/SPARK-21729
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>
> One challenge with the ProbabilisticClassifier abstraction is that it 
> introduces different code paths for predictions depending on which output 
> columns are turned on or off: probability, rawPrediction, prediction.  We ran 
> into a bug in MLOR with this.
> This task is for adding a generic test usable in all test suites for 
> ProbabilisticClassifier types which does the following:
> * Take a dataset + Estimator
> * Fit the Estimator
> * Test prediction using the model with all combinations of output columns 
> turned on/off.
> * Make sure the output column values match, presumably by comparing vs. the 
> case with all 3 output columns turned on
> CC [~WeichenXu123] since this came up in 
> https://github.com/apache/spark/pull/17373



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-08-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143418#comment-16143418
 ] 

Apache Spark commented on SPARK-21729:
--

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/19065

> Generic test for ProbabilisticClassifier to ensure consistent output columns
> 
>
> Key: SPARK-21729
> URL: https://issues.apache.org/jira/browse/SPARK-21729
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>
> One challenge with the ProbabilisticClassifier abstraction is that it 
> introduces different code paths for predictions depending on which output 
> columns are turned on or off: probability, rawPrediction, prediction.  We ran 
> into a bug in MLOR with this.
> This task is for adding a generic test usable in all test suites for 
> ProbabilisticClassifier types which does the following:
> * Take a dataset + Estimator
> * Fit the Estimator
> * Test prediction using the model with all combinations of output columns 
> turned on/off.
> * Make sure the output column values match, presumably by comparing vs. the 
> case with all 3 output columns turned on
> CC [~WeichenXu123] since this came up in 
> https://github.com/apache/spark/pull/17373



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21729:


Assignee: Apache Spark

> Generic test for ProbabilisticClassifier to ensure consistent output columns
> 
>
> Key: SPARK-21729
> URL: https://issues.apache.org/jira/browse/SPARK-21729
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>
> One challenge with the ProbabilisticClassifier abstraction is that it 
> introduces different code paths for predictions depending on which output 
> columns are turned on or off: probability, rawPrediction, prediction.  We ran 
> into a bug in MLOR with this.
> This task is for adding a generic test usable in all test suites for 
> ProbabilisticClassifier types which does the following:
> * Take a dataset + Estimator
> * Fit the Estimator
> * Test prediction using the model with all combinations of output columns 
> turned on/off.
> * Make sure the output column values match, presumably by comparing vs. the 
> case with all 3 output columns turned on
> CC [~WeichenXu123] since this came up in 
> https://github.com/apache/spark/pull/17373



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21848:


Assignee: (was: Apache Spark)

> Create trait to identify user-defined functions
> ---
>
> Key: SPARK-21848
> URL: https://issues.apache.org/jira/browse/SPARK-21848
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Priority: Minor
>
> Create a trait to make it easier for identifying what expressions are 
> user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21848:


Assignee: Apache Spark

> Create trait to identify user-defined functions
> ---
>
> Key: SPARK-21848
> URL: https://issues.apache.org/jira/browse/SPARK-21848
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>
> Create a trait to make it easier for identifying what expressions are 
> user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21848:


Assignee: Apache Spark

> Create trait to identify user-defined functions
> ---
>
> Key: SPARK-21848
> URL: https://issues.apache.org/jira/browse/SPARK-21848
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>
> Create a trait to make it easier for identifying what expressions are 
> user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21848:


Assignee: (was: Apache Spark)

> Create trait to identify user-defined functions
> ---
>
> Key: SPARK-21848
> URL: https://issues.apache.org/jira/browse/SPARK-21848
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Priority: Minor
>
> Create a trait to make it easier for identifying what expressions are 
> user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21848:


Assignee: (was: Apache Spark)

> Create trait to identify user-defined functions
> ---
>
> Key: SPARK-21848
> URL: https://issues.apache.org/jira/browse/SPARK-21848
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Priority: Minor
>
> Create a trait to make it easier for identifying what expressions are 
> user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21848:


Assignee: Apache Spark

> Create trait to identify user-defined functions
> ---
>
> Key: SPARK-21848
> URL: https://issues.apache.org/jira/browse/SPARK-21848
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>
> Create a trait to make it easier for identifying what expressions are 
> user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21848:


Assignee: Apache Spark

> Create trait to identify user-defined functions
> ---
>
> Key: SPARK-21848
> URL: https://issues.apache.org/jira/browse/SPARK-21848
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>
> Create a trait to make it easier for identifying what expressions are 
> user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21848:


Assignee: (was: Apache Spark)

> Create trait to identify user-defined functions
> ---
>
> Key: SPARK-21848
> URL: https://issues.apache.org/jira/browse/SPARK-21848
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Priority: Minor
>
> Create a trait to make it easier for identifying what expressions are 
> user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21848:


Assignee: Apache Spark

> Create trait to identify user-defined functions
> ---
>
> Key: SPARK-21848
> URL: https://issues.apache.org/jira/browse/SPARK-21848
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>
> Create a trait to make it easier for identifying what expressions are 
> user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21848) Create trait to identify user-defined functions

2017-08-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143376#comment-16143376
 ] 

Apache Spark commented on SPARK-21848:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/19064

> Create trait to identify user-defined functions
> ---
>
> Key: SPARK-21848
> URL: https://issues.apache.org/jira/browse/SPARK-21848
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Priority: Minor
>
> Create a trait to make it easier for identifying what expressions are 
> user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21848:


Assignee: (was: Apache Spark)

> Create trait to identify user-defined functions
> ---
>
> Key: SPARK-21848
> URL: https://issues.apache.org/jira/browse/SPARK-21848
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Priority: Minor
>
> Create a trait to make it easier for identifying what expressions are 
> user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21848) Create trait to identify user-defined functions

2017-08-27 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-21848:
--

 Summary: Create trait to identify user-defined functions
 Key: SPARK-21848
 URL: https://issues.apache.org/jira/browse/SPARK-21848
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 2.2.0
Reporter: Gengliang Wang
Priority: Minor


Create a trait to make it easier for identifying what expressions are 
user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21847) Where is the lit() function in pyspark 2.2.0?

2017-08-27 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-21847.
--
Resolution: Invalid

Questions should go to mailing list. I am resolving this and I can't reproduce 
it too.

> Where is the lit() function in pyspark 2.2.0?
> -
>
> Key: SPARK-21847
> URL: https://issues.apache.org/jira/browse/SPARK-21847
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Neil Huang
>
> when I do this:
> from pyspark.sql.functions import lit,nanvl
> it said the lit can not resolve,and I found there is no lit func in 
> functions.py
> but the doc said that it has it...
> in 2.1.1, there is lit func and work very well
> please where is the func, thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21847) Where is the lit() function in pyspark 2.2.0?

2017-08-27 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143343#comment-16143343
 ] 

Liang-Chi Hsieh edited comment on SPARK-21847 at 8/28/17 4:09 AM:
--

{code}
>>> from pyspark.sql.functions import lit,nanvl
>>> lit(1)
Column<1>
{code}

I can't reproduce it. Maybe you can provide more info/error message.

The python function {{lit}} is created by {{_create_function}}.


was (Author: viirya):
{code}
>>> from pyspark.sql.functions import lit,nanvl
>>> lit(1)
Column<1>
{code}

I can't reproduce it.

The python function {{lit}} is created by {{_create_function}}.

> Where is the lit() function in pyspark 2.2.0?
> -
>
> Key: SPARK-21847
> URL: https://issues.apache.org/jira/browse/SPARK-21847
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Neil Huang
>
> when I do this:
> from pyspark.sql.functions import lit,nanvl
> it said the lit can not resolve,and I found there is no lit func in 
> functions.py
> but the doc said that it has it...
> in 2.1.1, there is lit func and work very well
> please where is the func, thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21847) Where is the lit() function in pyspark 2.2.0?

2017-08-27 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143343#comment-16143343
 ] 

Liang-Chi Hsieh commented on SPARK-21847:
-

{code}
>>> from pyspark.sql.functions import lit,nanvl
>>> lit(1)
Column<1>
{code}

I can't reproduce it.

The python function {{lit}} is created by {{_create_function}}.

> Where is the lit() function in pyspark 2.2.0?
> -
>
> Key: SPARK-21847
> URL: https://issues.apache.org/jira/browse/SPARK-21847
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Neil Huang
>
> when I do this:
> from pyspark.sql.functions import lit,nanvl
> it said the lit can not resolve,and I found there is no lit func in 
> functions.py
> but the doc said that it has it...
> in 2.1.1, there is lit func and work very well
> please where is the func, thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: Apache Spark

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21583:


Assignee: (was: Apache Spark)

> Create a ColumnarBatch with ArrowColumnVectors for row based iteration
> --
>
> Key: SPARK-21583
> URL: https://issues.apache.org/jira/browse/SPARK-21583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data.  
> It would be useful to be able to create a {{ColumnarBatch}} to allow row 
> based iteration over multiple {{ArrowColumnVectors}}.  This would avoid extra 
> copying to translate column elements into rows and be more efficient memory 
> usage while increasing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Liang-Chi Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-21835:

Comment: was deleted

(was: Submitted PR at https://github.com/apache/spark/pull/19050)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143321#comment-16143321
 ] 

Apache Spark commented on SPARK-21835:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/19050

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143320#comment-16143320
 ] 

Apache Spark commented on SPARK-21835:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/19050

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21818) MultivariateOnlineSummarizer.variance generate negative result

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21818:


Assignee: (was: Apache Spark)

> MultivariateOnlineSummarizer.variance generate negative result
> --
>
> Key: SPARK-21818
> URL: https://issues.apache.org/jira/browse/SPARK-21818
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Because of numerical error, MultivariateOnlineSummarizer.variance is possible 
> to generate negative variance.
> This is a serious bug because many algos in MLLib use stddev computed from 
> sqrt(variance),
> it will generate NaN and crash the whole algorithm.
> we can reproduce this bug use the following code:
> {code}
> val summarizer1 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.7)
> val summarizer2 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.4)
> val summarizer3 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.5)
> val summarizer4 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.4)
> val summarizer = summarizer1
>   .merge(summarizer2)
>   .merge(summarizer3)
>   .merge(summarizer4)
> println(summarizer.variance(0))
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21818) MultivariateOnlineSummarizer.variance generate negative result

2017-08-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21818:


Assignee: Apache Spark

> MultivariateOnlineSummarizer.variance generate negative result
> --
>
> Key: SPARK-21818
> URL: https://issues.apache.org/jira/browse/SPARK-21818
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Assignee: Apache Spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Because of numerical error, MultivariateOnlineSummarizer.variance is possible 
> to generate negative variance.
> This is a serious bug because many algos in MLLib use stddev computed from 
> sqrt(variance),
> it will generate NaN and crash the whole algorithm.
> we can reproduce this bug use the following code:
> {code}
> val summarizer1 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.7)
> val summarizer2 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.4)
> val summarizer3 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.5)
> val summarizer4 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.4)
> val summarizer = summarizer1
>   .merge(summarizer2)
>   .merge(summarizer3)
>   .merge(summarizer4)
> println(summarizer.variance(0))
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >