[jira] [Assigned] (SPARK-21818) MultivariateOnlineSummarizer.variance generate negative result
[ https://issues.apache.org/jira/browse/SPARK-21818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-21818: - Assignee: Weichen Xu > MultivariateOnlineSummarizer.variance generate negative result > -- > > Key: SPARK-21818 > URL: https://issues.apache.org/jira/browse/SPARK-21818 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 2.2.0 >Reporter: Weichen Xu >Assignee: Weichen Xu > Fix For: 2.3.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Because of numerical error, MultivariateOnlineSummarizer.variance is possible > to generate negative variance. > This is a serious bug because many algos in MLLib use stddev computed from > sqrt(variance), > it will generate NaN and crash the whole algorithm. > we can reproduce this bug use the following code: > {code} > val summarizer1 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.7) > val summarizer2 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.4) > val summarizer3 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.5) > val summarizer4 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.4) > val summarizer = summarizer1 > .merge(summarizer2) > .merge(summarizer3) > .merge(summarizer4) > println(summarizer.variance(0)) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21818) MultivariateOnlineSummarizer.variance generate negative result
[ https://issues.apache.org/jira/browse/SPARK-21818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21818. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19029 [https://github.com/apache/spark/pull/19029] > MultivariateOnlineSummarizer.variance generate negative result > -- > > Key: SPARK-21818 > URL: https://issues.apache.org/jira/browse/SPARK-21818 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 2.2.0 >Reporter: Weichen Xu > Fix For: 2.3.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Because of numerical error, MultivariateOnlineSummarizer.variance is possible > to generate negative variance. > This is a serious bug because many algos in MLLib use stddev computed from > sqrt(variance), > it will generate NaN and crash the whole algorithm. > we can reproduce this bug use the following code: > {code} > val summarizer1 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.7) > val summarizer2 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.4) > val summarizer3 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.5) > val summarizer4 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.4) > val summarizer = summarizer1 > .merge(summarizer2) > .merge(summarizer3) > .merge(summarizer4) > println(summarizer.variance(0)) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21255) NPE when creating encoder for enum
[ https://issues.apache.org/jira/browse/SPARK-21255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143428#comment-16143428 ] Apache Spark commented on SPARK-21255: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/19066 > NPE when creating encoder for enum > -- > > Key: SPARK-21255 > URL: https://issues.apache.org/jira/browse/SPARK-21255 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.1.0 > Environment: org.apache.spark:spark-core_2.10:2.1.0 > org.apache.spark:spark-sql_2.10:2.1.0 >Reporter: Mike >Assignee: Mike > Fix For: 2.3.0 > > > When you try to create an encoder for Enum type (or bean with enum property) > via Encoders.bean(...), it fails with NullPointerException at TypeToken:495. > I did a little research and it turns out, that in JavaTypeInference:126 > following code > {code:java} > val beanInfo = Introspector.getBeanInfo(typeToken.getRawType) > val properties = beanInfo.getPropertyDescriptors.filterNot(_.getName == > "class") > val fields = properties.map { property => > val returnType = > typeToken.method(property.getReadMethod).getReturnType > val (dataType, nullable) = inferDataType(returnType) > new StructField(property.getName, dataType, nullable) > } > (new StructType(fields), true) > {code} > filters out properties named "class", because we wouldn't want to serialize > that. But enum types have another property of type Class named > "declaringClass", which we are trying to inspect recursively. Eventually we > try to inspect ClassLoader class, which has property "defaultAssertionStatus" > with no read method, which leads to NPE at TypeToken:495. > I think adding property name "declaringClass" to filtering will resolve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21255) NPE when creating encoder for enum
[ https://issues.apache.org/jira/browse/SPARK-21255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143426#comment-16143426 ] Apache Spark commented on SPARK-21255: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/19066 > NPE when creating encoder for enum > -- > > Key: SPARK-21255 > URL: https://issues.apache.org/jira/browse/SPARK-21255 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.1.0 > Environment: org.apache.spark:spark-core_2.10:2.1.0 > org.apache.spark:spark-sql_2.10:2.1.0 >Reporter: Mike >Assignee: Mike > Fix For: 2.3.0 > > > When you try to create an encoder for Enum type (or bean with enum property) > via Encoders.bean(...), it fails with NullPointerException at TypeToken:495. > I did a little research and it turns out, that in JavaTypeInference:126 > following code > {code:java} > val beanInfo = Introspector.getBeanInfo(typeToken.getRawType) > val properties = beanInfo.getPropertyDescriptors.filterNot(_.getName == > "class") > val fields = properties.map { property => > val returnType = > typeToken.method(property.getReadMethod).getReturnType > val (dataType, nullable) = inferDataType(returnType) > new StructField(property.getName, dataType, nullable) > } > (new StructType(fields), true) > {code} > filters out properties named "class", because we wouldn't want to serialize > that. But enum types have another property of type Class named > "declaringClass", which we are trying to inspect recursively. Eventually we > try to inspect ClassLoader class, which has property "defaultAssertionStatus" > with no read method, which leads to NPE at TypeToken:495. > I think adding property name "declaringClass" to filtering will resolve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns
[ https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21729: Assignee: Apache Spark > Generic test for ProbabilisticClassifier to ensure consistent output columns > > > Key: SPARK-21729 > URL: https://issues.apache.org/jira/browse/SPARK-21729 > Project: Spark > Issue Type: Test > Components: ML >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley >Assignee: Apache Spark > > One challenge with the ProbabilisticClassifier abstraction is that it > introduces different code paths for predictions depending on which output > columns are turned on or off: probability, rawPrediction, prediction. We ran > into a bug in MLOR with this. > This task is for adding a generic test usable in all test suites for > ProbabilisticClassifier types which does the following: > * Take a dataset + Estimator > * Fit the Estimator > * Test prediction using the model with all combinations of output columns > turned on/off. > * Make sure the output column values match, presumably by comparing vs. the > case with all 3 output columns turned on > CC [~WeichenXu123] since this came up in > https://github.com/apache/spark/pull/17373 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns
[ https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21729: Assignee: (was: Apache Spark) > Generic test for ProbabilisticClassifier to ensure consistent output columns > > > Key: SPARK-21729 > URL: https://issues.apache.org/jira/browse/SPARK-21729 > Project: Spark > Issue Type: Test > Components: ML >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley > > One challenge with the ProbabilisticClassifier abstraction is that it > introduces different code paths for predictions depending on which output > columns are turned on or off: probability, rawPrediction, prediction. We ran > into a bug in MLOR with this. > This task is for adding a generic test usable in all test suites for > ProbabilisticClassifier types which does the following: > * Take a dataset + Estimator > * Fit the Estimator > * Test prediction using the model with all combinations of output columns > turned on/off. > * Make sure the output column values match, presumably by comparing vs. the > case with all 3 output columns turned on > CC [~WeichenXu123] since this came up in > https://github.com/apache/spark/pull/17373 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns
[ https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21729: Assignee: (was: Apache Spark) > Generic test for ProbabilisticClassifier to ensure consistent output columns > > > Key: SPARK-21729 > URL: https://issues.apache.org/jira/browse/SPARK-21729 > Project: Spark > Issue Type: Test > Components: ML >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley > > One challenge with the ProbabilisticClassifier abstraction is that it > introduces different code paths for predictions depending on which output > columns are turned on or off: probability, rawPrediction, prediction. We ran > into a bug in MLOR with this. > This task is for adding a generic test usable in all test suites for > ProbabilisticClassifier types which does the following: > * Take a dataset + Estimator > * Fit the Estimator > * Test prediction using the model with all combinations of output columns > turned on/off. > * Make sure the output column values match, presumably by comparing vs. the > case with all 3 output columns turned on > CC [~WeichenXu123] since this came up in > https://github.com/apache/spark/pull/17373 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns
[ https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143420#comment-16143420 ] Apache Spark commented on SPARK-21729: -- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/19065 > Generic test for ProbabilisticClassifier to ensure consistent output columns > > > Key: SPARK-21729 > URL: https://issues.apache.org/jira/browse/SPARK-21729 > Project: Spark > Issue Type: Test > Components: ML >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley > > One challenge with the ProbabilisticClassifier abstraction is that it > introduces different code paths for predictions depending on which output > columns are turned on or off: probability, rawPrediction, prediction. We ran > into a bug in MLOR with this. > This task is for adding a generic test usable in all test suites for > ProbabilisticClassifier types which does the following: > * Take a dataset + Estimator > * Fit the Estimator > * Test prediction using the model with all combinations of output columns > turned on/off. > * Make sure the output column values match, presumably by comparing vs. the > case with all 3 output columns turned on > CC [~WeichenXu123] since this came up in > https://github.com/apache/spark/pull/17373 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns
[ https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21729: Assignee: (was: Apache Spark) > Generic test for ProbabilisticClassifier to ensure consistent output columns > > > Key: SPARK-21729 > URL: https://issues.apache.org/jira/browse/SPARK-21729 > Project: Spark > Issue Type: Test > Components: ML >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley > > One challenge with the ProbabilisticClassifier abstraction is that it > introduces different code paths for predictions depending on which output > columns are turned on or off: probability, rawPrediction, prediction. We ran > into a bug in MLOR with this. > This task is for adding a generic test usable in all test suites for > ProbabilisticClassifier types which does the following: > * Take a dataset + Estimator > * Fit the Estimator > * Test prediction using the model with all combinations of output columns > turned on/off. > * Make sure the output column values match, presumably by comparing vs. the > case with all 3 output columns turned on > CC [~WeichenXu123] since this came up in > https://github.com/apache/spark/pull/17373 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns
[ https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21729: Assignee: Apache Spark > Generic test for ProbabilisticClassifier to ensure consistent output columns > > > Key: SPARK-21729 > URL: https://issues.apache.org/jira/browse/SPARK-21729 > Project: Spark > Issue Type: Test > Components: ML >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley >Assignee: Apache Spark > > One challenge with the ProbabilisticClassifier abstraction is that it > introduces different code paths for predictions depending on which output > columns are turned on or off: probability, rawPrediction, prediction. We ran > into a bug in MLOR with this. > This task is for adding a generic test usable in all test suites for > ProbabilisticClassifier types which does the following: > * Take a dataset + Estimator > * Fit the Estimator > * Test prediction using the model with all combinations of output columns > turned on/off. > * Make sure the output column values match, presumably by comparing vs. the > case with all 3 output columns turned on > CC [~WeichenXu123] since this came up in > https://github.com/apache/spark/pull/17373 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns
[ https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143418#comment-16143418 ] Apache Spark commented on SPARK-21729: -- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/19065 > Generic test for ProbabilisticClassifier to ensure consistent output columns > > > Key: SPARK-21729 > URL: https://issues.apache.org/jira/browse/SPARK-21729 > Project: Spark > Issue Type: Test > Components: ML >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley > > One challenge with the ProbabilisticClassifier abstraction is that it > introduces different code paths for predictions depending on which output > columns are turned on or off: probability, rawPrediction, prediction. We ran > into a bug in MLOR with this. > This task is for adding a generic test usable in all test suites for > ProbabilisticClassifier types which does the following: > * Take a dataset + Estimator > * Fit the Estimator > * Test prediction using the model with all combinations of output columns > turned on/off. > * Make sure the output column values match, presumably by comparing vs. the > case with all 3 output columns turned on > CC [~WeichenXu123] since this came up in > https://github.com/apache/spark/pull/17373 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns
[ https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21729: Assignee: Apache Spark > Generic test for ProbabilisticClassifier to ensure consistent output columns > > > Key: SPARK-21729 > URL: https://issues.apache.org/jira/browse/SPARK-21729 > Project: Spark > Issue Type: Test > Components: ML >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley >Assignee: Apache Spark > > One challenge with the ProbabilisticClassifier abstraction is that it > introduces different code paths for predictions depending on which output > columns are turned on or off: probability, rawPrediction, prediction. We ran > into a bug in MLOR with this. > This task is for adding a generic test usable in all test suites for > ProbabilisticClassifier types which does the following: > * Take a dataset + Estimator > * Fit the Estimator > * Test prediction using the model with all combinations of output columns > turned on/off. > * Make sure the output column values match, presumably by comparing vs. the > case with all 3 output columns turned on > CC [~WeichenXu123] since this came up in > https://github.com/apache/spark/pull/17373 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: Apache Spark > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: (was: Apache Spark) > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: Apache Spark > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: (was: Apache Spark) > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: (was: Apache Spark) > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: Apache Spark > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: (was: Apache Spark) > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: Apache Spark > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: (was: Apache Spark) > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: Apache Spark > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions
[ https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21848: Assignee: (was: Apache Spark) > Create trait to identify user-defined functions > --- > > Key: SPARK-21848 > URL: https://issues.apache.org/jira/browse/SPARK-21848 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Gengliang Wang >Priority: Minor > > Create a trait to make it easier for identifying what expressions are > user-defined functions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions
[ https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21848: Assignee: Apache Spark > Create trait to identify user-defined functions > --- > > Key: SPARK-21848 > URL: https://issues.apache.org/jira/browse/SPARK-21848 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Minor > > Create a trait to make it easier for identifying what expressions are > user-defined functions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions
[ https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21848: Assignee: Apache Spark > Create trait to identify user-defined functions > --- > > Key: SPARK-21848 > URL: https://issues.apache.org/jira/browse/SPARK-21848 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Minor > > Create a trait to make it easier for identifying what expressions are > user-defined functions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions
[ https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21848: Assignee: (was: Apache Spark) > Create trait to identify user-defined functions > --- > > Key: SPARK-21848 > URL: https://issues.apache.org/jira/browse/SPARK-21848 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Gengliang Wang >Priority: Minor > > Create a trait to make it easier for identifying what expressions are > user-defined functions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions
[ https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21848: Assignee: (was: Apache Spark) > Create trait to identify user-defined functions > --- > > Key: SPARK-21848 > URL: https://issues.apache.org/jira/browse/SPARK-21848 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Gengliang Wang >Priority: Minor > > Create a trait to make it easier for identifying what expressions are > user-defined functions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions
[ https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21848: Assignee: Apache Spark > Create trait to identify user-defined functions > --- > > Key: SPARK-21848 > URL: https://issues.apache.org/jira/browse/SPARK-21848 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Minor > > Create a trait to make it easier for identifying what expressions are > user-defined functions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions
[ https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21848: Assignee: Apache Spark > Create trait to identify user-defined functions > --- > > Key: SPARK-21848 > URL: https://issues.apache.org/jira/browse/SPARK-21848 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Minor > > Create a trait to make it easier for identifying what expressions are > user-defined functions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions
[ https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21848: Assignee: (was: Apache Spark) > Create trait to identify user-defined functions > --- > > Key: SPARK-21848 > URL: https://issues.apache.org/jira/browse/SPARK-21848 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Gengliang Wang >Priority: Minor > > Create a trait to make it easier for identifying what expressions are > user-defined functions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions
[ https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21848: Assignee: Apache Spark > Create trait to identify user-defined functions > --- > > Key: SPARK-21848 > URL: https://issues.apache.org/jira/browse/SPARK-21848 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Minor > > Create a trait to make it easier for identifying what expressions are > user-defined functions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21848) Create trait to identify user-defined functions
[ https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143376#comment-16143376 ] Apache Spark commented on SPARK-21848: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/19064 > Create trait to identify user-defined functions > --- > > Key: SPARK-21848 > URL: https://issues.apache.org/jira/browse/SPARK-21848 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Gengliang Wang >Priority: Minor > > Create a trait to make it easier for identifying what expressions are > user-defined functions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions
[ https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21848: Assignee: (was: Apache Spark) > Create trait to identify user-defined functions > --- > > Key: SPARK-21848 > URL: https://issues.apache.org/jira/browse/SPARK-21848 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Gengliang Wang >Priority: Minor > > Create a trait to make it easier for identifying what expressions are > user-defined functions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21848) Create trait to identify user-defined functions
Gengliang Wang created SPARK-21848: -- Summary: Create trait to identify user-defined functions Key: SPARK-21848 URL: https://issues.apache.org/jira/browse/SPARK-21848 Project: Spark Issue Type: Task Components: SQL Affects Versions: 2.2.0 Reporter: Gengliang Wang Priority: Minor Create a trait to make it easier for identifying what expressions are user-defined functions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21847) Where is the lit() function in pyspark 2.2.0?
[ https://issues.apache.org/jira/browse/SPARK-21847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-21847. -- Resolution: Invalid Questions should go to mailing list. I am resolving this and I can't reproduce it too. > Where is the lit() function in pyspark 2.2.0? > - > > Key: SPARK-21847 > URL: https://issues.apache.org/jira/browse/SPARK-21847 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.2.0 >Reporter: Neil Huang > > when I do this: > from pyspark.sql.functions import lit,nanvl > it said the lit can not resolve,and I found there is no lit func in > functions.py > but the doc said that it has it... > in 2.1.1, there is lit func and work very well > please where is the func, thanks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21847) Where is the lit() function in pyspark 2.2.0?
[ https://issues.apache.org/jira/browse/SPARK-21847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143343#comment-16143343 ] Liang-Chi Hsieh edited comment on SPARK-21847 at 8/28/17 4:09 AM: -- {code} >>> from pyspark.sql.functions import lit,nanvl >>> lit(1) Column<1> {code} I can't reproduce it. Maybe you can provide more info/error message. The python function {{lit}} is created by {{_create_function}}. was (Author: viirya): {code} >>> from pyspark.sql.functions import lit,nanvl >>> lit(1) Column<1> {code} I can't reproduce it. The python function {{lit}} is created by {{_create_function}}. > Where is the lit() function in pyspark 2.2.0? > - > > Key: SPARK-21847 > URL: https://issues.apache.org/jira/browse/SPARK-21847 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.2.0 >Reporter: Neil Huang > > when I do this: > from pyspark.sql.functions import lit,nanvl > it said the lit can not resolve,and I found there is no lit func in > functions.py > but the doc said that it has it... > in 2.1.1, there is lit func and work very well > please where is the func, thanks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21847) Where is the lit() function in pyspark 2.2.0?
[ https://issues.apache.org/jira/browse/SPARK-21847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143343#comment-16143343 ] Liang-Chi Hsieh commented on SPARK-21847: - {code} >>> from pyspark.sql.functions import lit,nanvl >>> lit(1) Column<1> {code} I can't reproduce it. The python function {{lit}} is created by {{_create_function}}. > Where is the lit() function in pyspark 2.2.0? > - > > Key: SPARK-21847 > URL: https://issues.apache.org/jira/browse/SPARK-21847 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.2.0 >Reporter: Neil Huang > > when I do this: > from pyspark.sql.functions import lit,nanvl > it said the lit can not resolve,and I found there is no lit func in > functions.py > but the doc said that it has it... > in 2.1.1, there is lit func and work very well > please where is the func, thanks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: Apache Spark > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21583: Assignee: (was: Apache Spark) > Create a ColumnarBatch with ArrowColumnVectors for row based iteration > -- > > Key: SPARK-21583 > URL: https://issues.apache.org/jira/browse/SPARK-21583 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. > It would be useful to be able to create a {{ColumnarBatch}} to allow row > based iteration over multiple {{ArrowColumnVectors}}. This would avoid extra > copying to translate column elements into rows and be more efficient memory > usage while increasing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-21835: Comment: was deleted (was: Submitted PR at https://github.com/apache/spark/pull/19050) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143321#comment-16143321 ] Apache Spark commented on SPARK-21835: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/19050 > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143320#comment-16143320 ] Apache Spark commented on SPARK-21835: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/19050 > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: (was: Apache Spark) > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21835: Assignee: Apache Spark > RewritePredicateSubquery should not produce unresolved query plans > -- > > Key: SPARK-21835 > URL: https://issues.apache.org/jira/browse/SPARK-21835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. > During the structural integrity, I found {[RewritePredicateSubquery}} can > produce unresolved query plans due to conflicting attributes. We should not > let {{RewritePredicateSubquery}} produce unresolved plans. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21818) MultivariateOnlineSummarizer.variance generate negative result
[ https://issues.apache.org/jira/browse/SPARK-21818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21818: Assignee: (was: Apache Spark) > MultivariateOnlineSummarizer.variance generate negative result > -- > > Key: SPARK-21818 > URL: https://issues.apache.org/jira/browse/SPARK-21818 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 2.2.0 >Reporter: Weichen Xu > Original Estimate: 24h > Remaining Estimate: 24h > > Because of numerical error, MultivariateOnlineSummarizer.variance is possible > to generate negative variance. > This is a serious bug because many algos in MLLib use stddev computed from > sqrt(variance), > it will generate NaN and crash the whole algorithm. > we can reproduce this bug use the following code: > {code} > val summarizer1 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.7) > val summarizer2 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.4) > val summarizer3 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.5) > val summarizer4 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.4) > val summarizer = summarizer1 > .merge(summarizer2) > .merge(summarizer3) > .merge(summarizer4) > println(summarizer.variance(0)) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21818) MultivariateOnlineSummarizer.variance generate negative result
[ https://issues.apache.org/jira/browse/SPARK-21818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21818: Assignee: Apache Spark > MultivariateOnlineSummarizer.variance generate negative result > -- > > Key: SPARK-21818 > URL: https://issues.apache.org/jira/browse/SPARK-21818 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 2.2.0 >Reporter: Weichen Xu >Assignee: Apache Spark > Original Estimate: 24h > Remaining Estimate: 24h > > Because of numerical error, MultivariateOnlineSummarizer.variance is possible > to generate negative variance. > This is a serious bug because many algos in MLLib use stddev computed from > sqrt(variance), > it will generate NaN and crash the whole algorithm. > we can reproduce this bug use the following code: > {code} > val summarizer1 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.7) > val summarizer2 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.4) > val summarizer3 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.5) > val summarizer4 = (new MultivariateOnlineSummarizer) > .add(Vectors.dense(3.0), 0.4) > val summarizer = summarizer1 > .merge(summarizer2) > .merge(summarizer3) > .merge(summarizer4) > println(summarizer.variance(0)) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org