[GitHub] [spark] HyukjinKwon commented on a change in pull request #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types

2020-01-09 Thread GitBox
HyukjinKwon commented on a change in pull request #27165: 
[SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move 
inconsistent pandas UDF types
URL: https://github.com/apache/spark/pull/27165#discussion_r365109561
 
 

 ##
 File path: python/pyspark/sql/pandas/group_ops.py
 ##
 @@ -28,6 +28,7 @@ class PandasGroupedOpsMixin(object):
 can use this class.
 """
 
+@since(2.3)
 
 Review comment:
   I piggyback this change while I'm here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types

2020-01-09 Thread GitBox
HyukjinKwon commented on a change in pull request #27165: 
[SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move 
inconsistent pandas UDF types
URL: https://github.com/apache/spark/pull/27165#discussion_r365109478
 
 

 ##
 File path: python/pyspark/sql/tests/test_pandas_cogrouped_map.py
 ##
 @@ -180,59 +176,26 @@ def left_assign_key(key, l, _):
 assert_frame_equal(expected, result, 
check_column_type=_check_column_type)
 
 def test_wrong_return_type(self):
-with QuietTest(self.sc):
-with self.assertRaisesRegexp(
-NotImplementedError,
-'Invalid returnType.*cogrouped map Pandas UDF.*MapType'):
-pandas_udf(
-lambda l, r: l,
-'id long, v map',
-PandasUDFType.COGROUPED_MAP)
-
-def test_wrong_args(self):
 # Test that we get a sensible exception invalid values passed to apply
 left = self.data1
 right = self.data2
 with QuietTest(self.sc):
-# Function rather than a udf
 
 Review comment:
   `groupby.cogroup.applyInPandas` now always sets the right eval type 
internally. So we don't have to worry about mis-typed UDF.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories

2020-01-09 Thread GitBox
dongjoon-hyun commented on a change in pull request #27130: 
[SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with 
subdirectories
URL: https://github.com/apache/spark/pull/27130#discussion_r365109226
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala
 ##
 @@ -170,4 +170,156 @@ class HiveOrcSourceSuite extends OrcSuite with 
TestHiveSingleton {
   test("SPARK-11412 read and merge orc schemas in parallel") {
 testMergeSchemasInParallel(OrcFileOperator.readOrcSchemasInParallel)
   }
+
+  test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") {
+Seq(true, false).foreach { convertMetastore =>
+  withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> s"$convertMetastore") 
{
+withTempDir { dir =>
+  try {
+sql("USE default")
+sql(
+  """
+|CREATE EXTERNAL TABLE hive_orc(
 
 Review comment:
   I'm a little confused here.
   @kevinyu98 . Do you want to get a table created by Hive here?
   Usually, we use the table name, `hive_orc`, for that table. Please see 
https://github.com/apache/spark/pull/27130/files#diff-a8c26a35def87a13e6b59db19d9fb8a1R68
 .
   
   And, you still using `hiveClient.runSqlHive` at line 192. I'm wondering what 
is the test target in this PR~.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of t

2020-01-09 Thread GitBox
beliefer commented on issue #27058: [SPARK-30395][SQL] When one or more 
DISTINCT aggregate expressions operate on the same field, the DISTINCT 
aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#issuecomment-572917240
 
 
   > Ah, also, can you put a simple explain example (about how to convert a 
plan with distinct aggregates) in the PR description? better to put how-to-fix 
in this pr there.
   
   OK


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression

2020-01-09 Thread GitBox
beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When 
one or more DISTINCT aggregate expressions operate on the same field, the 
DISTINCT aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#discussion_r365107970
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
 ##
 @@ -316,6 +362,86 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
 }.asInstanceOf[NamedExpression]
   }
   Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate)
+} else if (distinctAggGroups.size == 1) {
+  val (distinctAggExpressions, regularAggExpressions) = 
aggExpressions.partition(_.isDistinct)
+  if (distinctAggExpressions.exists(_.filter.isDefined)) {
+val regularAggExprs = regularAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val regularFunChildren = regularAggExprs
+  .flatMap(_.aggregateFunction.children.filter(!_.foldable))
+val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes)
+val regularAggChildren = (regularFunChildren ++ 
regularFilterAttrs).distinct
+val regularAggChildAttrMap = 
regularAggChildren.map(expressionAttributePair)
+val regularAggChildAttrLookup = regularAggChildAttrMap.toMap
+val regularOperatorMap = regularAggExprs.map {
+  case ae @ AggregateExpression(af, _, _, filter, _) =>
+val newChildren = af.children.map(c => 
regularAggChildAttrLookup.getOrElse(c, c))
+val raf = 
af.withNewChildren(newChildren).asInstanceOf[AggregateFunction]
+val filterOpt = filter.map(_.transform {
+  case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a)
+})
+val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt)
+(ae, aggExpr)
+}
+val distinctAggExprs = distinctAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map {
+  case (ae @ AggregateExpression(af, _, _, filter, _), i) =>
+// Why do we need to construct the phantom id ?
+// First, In order to reduce costs, it is better to handle the 
filter clause locally.
+// e.g. COUNT (DISTINCT a) FILTER (WHERE id > 1), evaluate 
expression
+// If(id > 1) 'a else null first, and use the result as output.
+// Second, If more than one DISTINCT aggregate expression uses the 
same column,
+// We need to construct the phantom attributes so as the output 
not lost.
+// e.g. SUM (DISTINCT a), COUNT (DISTINCT a) FILTER (WHERE id > 1) 
will output
+// attribute 'phantom1-a and attribute 'phantom2-a instead of two 
'a.
+// Note: We just need to illusion the expression with filter 
clause.
+// The illusionary mechanism may result in multiple distinct 
aggregations uses
+// different column, so we still need to call `rewrite`.
+val phantomId = i + 1
+val unfoldableChildren = af.children.filter(!_.foldable)
+val exprAttrs = unfoldableChildren.map { e =>
+  (e, AttributeReference(s"phantom$phantomId-${e.sql}", 
e.dataType, nullable = true)())
 
 Review comment:
   OK. Could I use `_gen_phantom_${exprId.id}` ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types

2020-01-09 Thread GitBox
HyukjinKwon commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type 
hints in pandas UDF and rename/move inconsistent pandas UDF types
URL: https://github.com/apache/spark/pull/27165#issuecomment-572917051
 
 
   cc @rxin, @zero323, @cloud-fan, @mengxr, @viirya, @ueshin, @BryanCutler, 
@icexelloss, @dongjoon-hyun FYI


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for 
pyspark test test_memory_limit
URL: https://github.com/apache/spark/pull/27159#issuecomment-572916501
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for 
pyspark test test_memory_limit
URL: https://github.com/apache/spark/pull/27159#issuecomment-572916510
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116450/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27030: [SPARK-30244][SQL] Emit 
pre/post events for "Partition" methods in ExternalCatalogWithListener
URL: https://github.com/apache/spark/pull/27030#issuecomment-572916406
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark 
test test_memory_limit
URL: https://github.com/apache/spark/pull/27159#issuecomment-572916501
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types

2020-01-09 Thread GitBox
HyukjinKwon opened a new pull request #27165: [SPARK-28264][PYTHON][SQL] 
Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
URL: https://github.com/apache/spark/pull/27165
 
 
   ### What changes were proposed in this pull request?
   
   This PR proposes to redesign pandas UDFs as described in [the 
proposal](https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing).
   
   Note that, this PR address one of the future improvements described 
[here](https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit#heading=h.h3ncjpk6ujqu),
 "A couple of less-intuitive pandas UDF types" (by @zero323) together.
   
   In short,
   
   - New way with type hints as an alternative and experimental way.
   ```python
   @pandas_udf(schema='...')
   def func(c1: Series, c2: Series) -> DataFrame:
   pass
   ```
   
   - Remove three types below from UDF, and make them as separate standalone 
APIs. So, `pandas_udf` is now consistent with regular `udf`s and other 
expressions.
   
   `df.mapInPandas(udf)`  -> `df.mapInPandas(func, schema)`
   `df.groupby.apply(udf)`  -> `df.groupby.applyInPandas(func, schema)`
   `df.groupby.cogroup.apply(udf)`  -> 
`df.groupby.cogroup.applyInPandas(func, schema)`
   
   - No deprecation for the existing ways for now.
   ```python
   @pandas_udf(schema='...', functionType=PandasUDFType.SCALAR)
   def func(c1, c2):
   pass
   ```
   If users are happy with this, I plan to deprecate the existing way and 
declare using type hints is not experimental anymore.
   
   One design goal in this PR was that, avoid touching the internal (since we 
didn't deprecate the old ways for now), but supports type hints with a 
minimised changes only at the interface.
   
   - Once we deprecate or remove the old ways, I think it requires another 
refactoring for the internal in the future. At the very least, we should rename 
internal pandas evaluation types.
   - If users find this experimental type hints isn't quite helpful, we should 
simply revert the changes at the interface level.
   
   ### Why are the changes needed?
   
   In order to address old design issues. Please see [the 
proposal](https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing).
   
   ### Does this PR introduce any user-facing change?
   
   For behaviour changes, No.
   
   It adds new ways to use pandas UDFs by using type hints. See below.
   
   **SCALAR**:
   
   ```python
   @pandas_udf(schema='...')
   def func(c1: Series, c2: DataFrame) -> Series:
   pass  # DataFrame represents a struct column
   ```
   
   **SCALAR_ITER**:
   
   ```python
   @pandas_udf(schema='...')
   def func(iter: Iterator[Tuple[Series, DataFrame, ...]]) -> Iterator[Series]:
   pass  # Same as SCALAR but wrapped by Iterator
   ```
   
   **GROUPED_AGG**:
   
   ```python
   @pandas_udf(schema='...')
   def func(c1: Series, c2: DataFrame) -> int:
   pass  # DataFrame represents a struct column
   ```
   
   **GROUPED_MAP**:
   
   This was added in Spark 2.3 as of SPARK-20396. As described above, it keeps 
the existing behaviour. Instead, we have a new alias `groupby.applyInPandas` 
for `groupby.apply`. See the example below:
   
   
   ```python
   def func(pdf):
   return pdf
   
   df.groupby("...").applyInPandas(func, schema=df.schema)
   ```
   
   
   **MAP_ITER**:
   
   This was added in Spark 3.0 as of SPARK-28198; and this PR replaces the 
usages. See the example below:
   
   ```python
   def func(iter):
   for df in iter:
   yield df
   
   df.mapInPandas(func, df.schema)
   ```
   
   
   **COGROUPED_MAP**
   
   This was added in Spark 3.0 as of SPARK-27463; and this PR replaces the 
usages. See the example below:
   
   ```python
   def asof_join(left, right):
   return pd.merge_asof(left, right, on="...", by="...")
   
df1.groupby("...").cogroup(df2.groupby("...")).applyInPandas(asof_join, 
schema="...")
   ```
   
   ### How was this patch tested?
   
   Unittests added and tested against Python 2.7, 3.6 and 3.7.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27030: [SPARK-30244][SQL] Emit 
pre/post events for "Partition" methods in ExternalCatalogWithListener
URL: https://github.com/apache/spark/pull/27030#issuecomment-572916416
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21251/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark 
test test_memory_limit
URL: https://github.com/apache/spark/pull/27159#issuecomment-572916510
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116450/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #26956: [SPARK-30312][SQL] Preserve path permission and acl when truncate table

2020-01-09 Thread GitBox
viirya commented on a change in pull request #26956: [SPARK-30312][SQL] 
Preserve path permission and acl when truncate table
URL: https://github.com/apache/spark/pull/26956#discussion_r365108284
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
 ##
 @@ -1981,6 +1982,60 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("SPARK-30312: truncate table - keep acl/permission") {
+import testImplicits._
+val ignorePermissionAcl = Seq(true, false)
+
+ignorePermissionAcl.foreach { ignore =>
+  withSQLConf(
+"fs.file.impl" -> classOf[FakeLocalFsFileSystem].getName,
+"fs.file.impl.disable.cache" -> "true",
+SQLConf.TRUNCATE_TABLE_IGNORE_PERMISSION_ACL.key -> ignore.toString) {
+withTable("tab1") {
+  sql("CREATE TABLE tab1 (col INT) USING parquet")
+  sql("INSERT INTO tab1 SELECT 1")
+  checkAnswer(spark.table("tab1"), Row(1))
+
+  val tablePath = new Path(spark.sessionState.catalog
+.getTableMetadata(TableIdentifier("tab1")).storage.locationUri.get)
+
+  val hadoopConf = spark.sessionState.newHadoopConf()
+  val fs = tablePath.getFileSystem(hadoopConf)
+  val fileStatus = fs.getFileStatus(tablePath);
+
+  fs.setPermission(tablePath, new FsPermission("777"))
+  assert(fileStatus.getPermission().toString() == "rwxrwxrwx")
+
+  // Set ACL to table path.
+  val customAcl = new java.util.ArrayList[AclEntry]()
+  customAcl.add(new AclEntry.Builder()
+.setType(AclEntryType.USER)
+.setScope(AclEntryScope.ACCESS)
+.setPermission(FsAction.READ).build())
+  fs.setAcl(tablePath, customAcl)
+  assert(fs.getAclStatus(tablePath).getEntries().get(0) == 
customAcl.get(0))
+
+  sql("TRUNCATE TABLE tab1")
+  assert(spark.table("tab1").collect().isEmpty)
+
+  val fileStatus2 = fs.getFileStatus(tablePath)
+  if (ignore) {
+assert(fileStatus2.getPermission().toString() == "rwxr-xr-x")
 
 Review comment:
   Good point! Let me update it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27030: [SPARK-30244][SQL] Emit pre/post 
events for "Partition" methods in ExternalCatalogWithListener
URL: https://github.com/apache/spark/pull/27030#issuecomment-572916416
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21251/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27030: [SPARK-30244][SQL] Emit pre/post 
events for "Partition" methods in ExternalCatalogWithListener
URL: https://github.com/apache/spark/pull/27030#issuecomment-572916406
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
SparkQA removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for 
pyspark test test_memory_limit
URL: https://github.com/apache/spark/pull/27159#issuecomment-572878167
 
 
   **[Test build #116450 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116450/testReport)**
 for PR 27159 at commit 
[`b77f9d8`](https://github.com/apache/spark/commit/b77f9d8759eddfc5c56c8de14037cdcf5dc0c989).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
HeartSaVioR commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark 
test test_memory_limit
URL: https://github.com/apache/spark/pull/27159#issuecomment-572916229
 
 
   3 tests ran so far, 1 failed from other flaky test, 2 passed. Looks like 
working so far. Closing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR closed pull request #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
HeartSaVioR closed pull request #27159: [DO-NOT-MERGE] Testing fix for pyspark 
test test_memory_limit
URL: https://github.com/apache/spark/pull/27159
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener

2020-01-09 Thread GitBox
SparkQA commented on issue #27030: [SPARK-30244][SQL] Emit pre/post events for 
"Partition" methods in ExternalCatalogWithListener
URL: https://github.com/apache/spark/pull/27030#issuecomment-572915998
 
 
   **[Test build #116464 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116464/testReport)**
 for PR 27030 at commit 
[`3cb79fe`](https://github.com/apache/spark/commit/3cb79fe63a14ecba62bfc9b449ecf9b4a28b6e10).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression

2020-01-09 Thread GitBox
beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When 
one or more DISTINCT aggregate expressions operate on the same field, the 
DISTINCT aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#discussion_r365107970
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
 ##
 @@ -316,6 +362,86 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
 }.asInstanceOf[NamedExpression]
   }
   Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate)
+} else if (distinctAggGroups.size == 1) {
+  val (distinctAggExpressions, regularAggExpressions) = 
aggExpressions.partition(_.isDistinct)
+  if (distinctAggExpressions.exists(_.filter.isDefined)) {
+val regularAggExprs = regularAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val regularFunChildren = regularAggExprs
+  .flatMap(_.aggregateFunction.children.filter(!_.foldable))
+val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes)
+val regularAggChildren = (regularFunChildren ++ 
regularFilterAttrs).distinct
+val regularAggChildAttrMap = 
regularAggChildren.map(expressionAttributePair)
+val regularAggChildAttrLookup = regularAggChildAttrMap.toMap
+val regularOperatorMap = regularAggExprs.map {
+  case ae @ AggregateExpression(af, _, _, filter, _) =>
+val newChildren = af.children.map(c => 
regularAggChildAttrLookup.getOrElse(c, c))
+val raf = 
af.withNewChildren(newChildren).asInstanceOf[AggregateFunction]
+val filterOpt = filter.map(_.transform {
+  case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a)
+})
+val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt)
+(ae, aggExpr)
+}
+val distinctAggExprs = distinctAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map {
+  case (ae @ AggregateExpression(af, _, _, filter, _), i) =>
+// Why do we need to construct the phantom id ?
+// First, In order to reduce costs, it is better to handle the 
filter clause locally.
+// e.g. COUNT (DISTINCT a) FILTER (WHERE id > 1), evaluate 
expression
+// If(id > 1) 'a else null first, and use the result as output.
+// Second, If more than one DISTINCT aggregate expression uses the 
same column,
+// We need to construct the phantom attributes so as the output 
not lost.
+// e.g. SUM (DISTINCT a), COUNT (DISTINCT a) FILTER (WHERE id > 1) 
will output
+// attribute 'phantom1-a and attribute 'phantom2-a instead of two 
'a.
+// Note: We just need to illusion the expression with filter 
clause.
+// The illusionary mechanism may result in multiple distinct 
aggregations uses
+// different column, so we still need to call `rewrite`.
+val phantomId = i + 1
+val unfoldableChildren = af.children.filter(!_.foldable)
+val exprAttrs = unfoldableChildren.map { e =>
+  (e, AttributeReference(s"phantom$phantomId-${e.sql}", 
e.dataType, nullable = true)())
 
 Review comment:
   OK.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
SparkQA commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test 
test_memory_limit
URL: https://github.com/apache/spark/pull/27159#issuecomment-572915778
 
 
   **[Test build #116450 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116450/testReport)**
 for PR 27159 at commit 
[`b77f9d8`](https://github.com/apache/spark/commit/b77f9d8759eddfc5c56c8de14037cdcf5dc0c989).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories

2020-01-09 Thread GitBox
dongjoon-hyun commented on a change in pull request #27130: 
[SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with 
subdirectories
URL: https://github.com/apache/spark/pull/27130#discussion_r365107735
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
 ##
 @@ -222,4 +223,127 @@ class HiveParquetSourceSuite extends 
ParquetPartitioningTest {
   assert(df4.columns === Array("str", "max_int"))
 }
   }
+
+  test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") {
+Seq("true", "false").foreach { parquetConversion =>
+  withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
parquetConversion) {
+withTempPath { path =>
+  withTable("tbl1", "tbl2", "tbl3", "tbl4", "tbl5", "tbl6") {
+val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")).
+  toDF("c1", "c2", "c3").repartition(1)
+val someDF2 = Seq((3, 3, "parq3"), (4, 4, "parq4")).
+  toDF("c1", "c2", "c3").repartition(1)
+val someDF3 = Seq((5, 5, "parq5"), (6, 6, "parq6")).
+  toDF("c1", "c2", "c3").repartition(1)
+someDF1.write.parquet(s"${path.getCanonicalPath}/l1/")
+someDF2.write.parquet(s"${path.getCanonicalPath}/l1/l2/")
+someDF3.write.parquet(s"${path.getCanonicalPath}/l1/l2/l3/")
+
+val topDirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl1(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin
+sql(topDirStatement)
+if (parquetConversion == "true") {
+  checkAnswer(sql("SELECT * FROM tbl1"), Nil)
+} else {
+  val msg = intercept[IOException] {sql("SELECT * FROM 
tbl1").show()
+  }.getMessage
+assert(msg.contains("Not a file:"))
+}
+
+val l1DirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl2(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION '${s"${path.getCanonicalPath}/l1/"}'""".stripMargin
+sql(l1DirStatement)
+if (parquetConversion == "true") {
+  checkAnswer(sql("SELECT * FROM tbl2"),
+(1 to 2).map(i => Row(i, i, s"parq$i")))
+} else {
+  val msg = intercept[IOException] {
+sql("SELECT * FROM tbl2").show()
+  }.getMessage
+  assert(msg.contains("Not a file:"))
+}
+
+val l2DirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl3(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION 
'${s"${path.getCanonicalPath}/l1/l2/"}'""".stripMargin
+sql(l2DirStatement)
+if (parquetConversion == "true") {
+  checkAnswer(sql("SELECT * FROM tbl3"),
+(3 to 4).map(i => Row(i, i, s"parq$i")))
+} else {
+  val msg = intercept[IOException] {sql("SELECT * FROM 
tbl3").show()
 
 Review comment:
   `sql("SELECT * FROM tbl3").show()`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories

2020-01-09 Thread GitBox
dongjoon-hyun commented on a change in pull request #27130: 
[SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with 
subdirectories
URL: https://github.com/apache/spark/pull/27130#discussion_r365107848
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
 ##
 @@ -222,4 +223,127 @@ class HiveParquetSourceSuite extends 
ParquetPartitioningTest {
   assert(df4.columns === Array("str", "max_int"))
 }
   }
+
+  test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") {
+Seq("true", "false").foreach { parquetConversion =>
+  withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
parquetConversion) {
+withTempPath { path =>
+  withTable("tbl1", "tbl2", "tbl3", "tbl4", "tbl5", "tbl6") {
+val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")).
+  toDF("c1", "c2", "c3").repartition(1)
+val someDF2 = Seq((3, 3, "parq3"), (4, 4, "parq4")).
+  toDF("c1", "c2", "c3").repartition(1)
+val someDF3 = Seq((5, 5, "parq5"), (6, 6, "parq6")).
+  toDF("c1", "c2", "c3").repartition(1)
+someDF1.write.parquet(s"${path.getCanonicalPath}/l1/")
+someDF2.write.parquet(s"${path.getCanonicalPath}/l1/l2/")
+someDF3.write.parquet(s"${path.getCanonicalPath}/l1/l2/l3/")
+
+val topDirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl1(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin
+sql(topDirStatement)
+if (parquetConversion == "true") {
+  checkAnswer(sql("SELECT * FROM tbl1"), Nil)
+} else {
+  val msg = intercept[IOException] {sql("SELECT * FROM 
tbl1").show()
+  }.getMessage
+assert(msg.contains("Not a file:"))
+}
+
+val l1DirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl2(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION '${s"${path.getCanonicalPath}/l1/"}'""".stripMargin
+sql(l1DirStatement)
+if (parquetConversion == "true") {
+  checkAnswer(sql("SELECT * FROM tbl2"),
+(1 to 2).map(i => Row(i, i, s"parq$i")))
+} else {
+  val msg = intercept[IOException] {
+sql("SELECT * FROM tbl2").show()
+  }.getMessage
+  assert(msg.contains("Not a file:"))
+}
+
+val l2DirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl3(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION 
'${s"${path.getCanonicalPath}/l1/l2/"}'""".stripMargin
+sql(l2DirStatement)
+if (parquetConversion == "true") {
+  checkAnswer(sql("SELECT * FROM tbl3"),
+(3 to 4).map(i => Row(i, i, s"parq$i")))
+} else {
+  val msg = intercept[IOException] {sql("SELECT * FROM 
tbl3").show()
+  }.getMessage
+  assert(msg.contains("Not a file:"))
+}
+
+val wildcardTopDirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl4(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION '${new File(s"${path}/*").toURI}'""".stripMargin
+sql(wildcardTopDirStatement)
+if (parquetConversion == "true") {
+  checkAnswer(sql("SELECT * FROM tbl4"),
+(1 to 2).map(i => Row(i, i, s"parq$i")))
+} else {
+  val msg = intercept[IOException] {
+sql("SELECT * FROM tbl4").show()
+  }.getMessage
+  assert(msg.contains("Not a file:"))
+}
+
+val wildcardL1DirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl5(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION '${new File(s"${path}/l1/*").toURI}'""".stripMargin
+sql(wildcardL1DirStatement)
+if (parquetConversion == "true") {
+  checkAnswer(sql("SELECT * FROM tbl5"),
+(1 to 4).map(i => Row(i, i, s"parq$i")))
+} else {
+  val msg = intercept[IOException] {sql("SELECT * FROM 
tbl5").show()
 
 Review comment:
   ditto.


This is an automated message from the Apache Git Service.
To respond to the message, please 

[GitHub] [spark] beliefer commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of t

2020-01-09 Thread GitBox
beliefer commented on issue #27058: [SPARK-30395][SQL] When one or more 
DISTINCT aggregate expressions operate on the same field, the DISTINCT 
aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#issuecomment-572915374
 
 
   > btw, we need a different approach for supporting multiple distinct groups 
(SPARK-30396)? Why did you split the distinct support into two?
   
   This PR will support
   `select a, sum(distinct b) filter (where ...) from t group by a;`
   We only have one DISTINCT aggregate expr, so the columns where the aggregate 
function acting on is same.
   `select a, sum(distinct b) filter (where ...), count(distinct b) filter 
(where ...) from t group by a;`
   We have two DISTINCT aggregate exprs, but the columns where each aggregate 
function acting on is same.
   SPARK-30396 will support
   `select a, sum(distinct b) filter (where ...), count(distinct c) filter 
(where ...) from t group by a;`
   We have two DISTINCT aggregate exprs, the columns where each aggregate 
function acting on is different.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on issue #26978: [SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project

2020-01-09 Thread GitBox
viirya commented on issue #26978: [SPARK-29721][SQL] Prune unnecessary nested 
fields from Generate without Project
URL: https://github.com/apache/spark/pull/26978#issuecomment-572915404
 
 
   @dongjoon-hyun Yes, I do think so too. Let's see if we can have more details 
from @cloud-fan.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories

2020-01-09 Thread GitBox
dongjoon-hyun commented on a change in pull request #27130: 
[SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with 
subdirectories
URL: https://github.com/apache/spark/pull/27130#discussion_r365107663
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
 ##
 @@ -222,4 +223,127 @@ class HiveParquetSourceSuite extends 
ParquetPartitioningTest {
   assert(df4.columns === Array("str", "max_int"))
 }
   }
+
+  test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") {
+Seq("true", "false").foreach { parquetConversion =>
+  withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
parquetConversion) {
+withTempPath { path =>
+  withTable("tbl1", "tbl2", "tbl3", "tbl4", "tbl5", "tbl6") {
+val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")).
+  toDF("c1", "c2", "c3").repartition(1)
+val someDF2 = Seq((3, 3, "parq3"), (4, 4, "parq4")).
+  toDF("c1", "c2", "c3").repartition(1)
+val someDF3 = Seq((5, 5, "parq5"), (6, 6, "parq6")).
+  toDF("c1", "c2", "c3").repartition(1)
+someDF1.write.parquet(s"${path.getCanonicalPath}/l1/")
+someDF2.write.parquet(s"${path.getCanonicalPath}/l1/l2/")
+someDF3.write.parquet(s"${path.getCanonicalPath}/l1/l2/l3/")
+
+val topDirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl1(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin
+sql(topDirStatement)
+if (parquetConversion == "true") {
+  checkAnswer(sql("SELECT * FROM tbl1"), Nil)
+} else {
+  val msg = intercept[IOException] {sql("SELECT * FROM 
tbl1").show()
+  }.getMessage
+assert(msg.contains("Not a file:"))
+}
+
+val l1DirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl2(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION '${s"${path.getCanonicalPath}/l1/"}'""".stripMargin
+sql(l1DirStatement)
+if (parquetConversion == "true") {
+  checkAnswer(sql("SELECT * FROM tbl2"),
+(1 to 2).map(i => Row(i, i, s"parq$i")))
 
 Review comment:
   Can we merge 269 and 270 into one line here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories

2020-01-09 Thread GitBox
dongjoon-hyun commented on a change in pull request #27130: 
[SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with 
subdirectories
URL: https://github.com/apache/spark/pull/27130#discussion_r365107292
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
 ##
 @@ -222,4 +223,127 @@ class HiveParquetSourceSuite extends 
ParquetPartitioningTest {
   assert(df4.columns === Array("str", "max_int"))
 }
   }
+
+  test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") {
+Seq("true", "false").foreach { parquetConversion =>
+  withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
parquetConversion) {
+withTempPath { path =>
+  withTable("tbl1", "tbl2", "tbl3", "tbl4", "tbl5", "tbl6") {
+val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")).
+  toDF("c1", "c2", "c3").repartition(1)
+val someDF2 = Seq((3, 3, "parq3"), (4, 4, "parq4")).
+  toDF("c1", "c2", "c3").repartition(1)
+val someDF3 = Seq((5, 5, "parq5"), (6, 6, "parq6")).
+  toDF("c1", "c2", "c3").repartition(1)
+someDF1.write.parquet(s"${path.getCanonicalPath}/l1/")
+someDF2.write.parquet(s"${path.getCanonicalPath}/l1/l2/")
+someDF3.write.parquet(s"${path.getCanonicalPath}/l1/l2/l3/")
+
+val topDirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl1(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin
+sql(topDirStatement)
+if (parquetConversion == "true") {
+  checkAnswer(sql("SELECT * FROM tbl1"), Nil)
+} else {
+  val msg = intercept[IOException] {sql("SELECT * FROM 
tbl1").show()
 
 Review comment:
   `sql("SELECT * FROM tbl1").show()` seems to need to be in the next line.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories

2020-01-09 Thread GitBox
dongjoon-hyun commented on a change in pull request #27130: 
[SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with 
subdirectories
URL: https://github.com/apache/spark/pull/27130#discussion_r365107292
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
 ##
 @@ -222,4 +223,127 @@ class HiveParquetSourceSuite extends 
ParquetPartitioningTest {
   assert(df4.columns === Array("str", "max_int"))
 }
   }
+
+  test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") {
+Seq("true", "false").foreach { parquetConversion =>
+  withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
parquetConversion) {
+withTempPath { path =>
+  withTable("tbl1", "tbl2", "tbl3", "tbl4", "tbl5", "tbl6") {
+val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")).
+  toDF("c1", "c2", "c3").repartition(1)
+val someDF2 = Seq((3, 3, "parq3"), (4, 4, "parq4")).
+  toDF("c1", "c2", "c3").repartition(1)
+val someDF3 = Seq((5, 5, "parq5"), (6, 6, "parq6")).
+  toDF("c1", "c2", "c3").repartition(1)
+someDF1.write.parquet(s"${path.getCanonicalPath}/l1/")
+someDF2.write.parquet(s"${path.getCanonicalPath}/l1/l2/")
+someDF3.write.parquet(s"${path.getCanonicalPath}/l1/l2/l3/")
+
+val topDirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl1(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin
+sql(topDirStatement)
+if (parquetConversion == "true") {
+  checkAnswer(sql("SELECT * FROM tbl1"), Nil)
+} else {
+  val msg = intercept[IOException] {sql("SELECT * FROM 
tbl1").show()
 
 Review comment:
   `sql("SELECT * FROM tbl1").show()` seems to be in the next line.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories

2020-01-09 Thread GitBox
dongjoon-hyun commented on a change in pull request #27130: 
[SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with 
subdirectories
URL: https://github.com/apache/spark/pull/27130#discussion_r365107448
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
 ##
 @@ -222,4 +223,127 @@ class HiveParquetSourceSuite extends 
ParquetPartitioningTest {
   assert(df4.columns === Array("str", "max_int"))
 }
   }
+
+  test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") {
+Seq("true", "false").foreach { parquetConversion =>
+  withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
parquetConversion) {
+withTempPath { path =>
+  withTable("tbl1", "tbl2", "tbl3", "tbl4", "tbl5", "tbl6") {
+val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")).
+  toDF("c1", "c2", "c3").repartition(1)
+val someDF2 = Seq((3, 3, "parq3"), (4, 4, "parq4")).
+  toDF("c1", "c2", "c3").repartition(1)
+val someDF3 = Seq((5, 5, "parq5"), (6, 6, "parq6")).
+  toDF("c1", "c2", "c3").repartition(1)
+someDF1.write.parquet(s"${path.getCanonicalPath}/l1/")
+someDF2.write.parquet(s"${path.getCanonicalPath}/l1/l2/")
+someDF3.write.parquet(s"${path.getCanonicalPath}/l1/l2/l3/")
+
+val topDirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl1(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin
+sql(topDirStatement)
+if (parquetConversion == "true") {
+  checkAnswer(sql("SELECT * FROM tbl1"), Nil)
+} else {
+  val msg = intercept[IOException] {sql("SELECT * FROM 
tbl1").show()
+  }.getMessage
+assert(msg.contains("Not a file:"))
 
 Review comment:
   indentation?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode

2020-01-09 Thread GitBox
cloud-fan commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale 
of Decimal under ansi mode
URL: https://github.com/apache/spark/pull/26881#issuecomment-572914440
 
 
   also cc @viirya @maropu 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] peter-toth commented on a change in pull request #27119: [SPARK-30447][SQL] Constant propagation nullability issue

2020-01-09 Thread GitBox
peter-toth commented on a change in pull request #27119: [SPARK-30447][SQL] 
Constant propagation nullability issue
URL: https://github.com/apache/spark/pull/27119#discussion_r365106443
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ##
 @@ -132,11 +142,14 @@ object ConstantPropagation extends Rule[LogicalPlan] 
with PredicateHelper {
 (newSelf, Seq.empty)
   case n: Not =>
 // Ignore the EqualityPredicates from children since they are only 
propagated through And.
-val (newChild, _) = traverse(n.child, replaceChildren = true)
+val (newChild, _) = traverse(n.child, replaceChildren = true, 
nullIsFalse = false)
 (newChild.map(Not), Seq.empty)
   case _ => (None, Seq.empty)
 }
 
+  private def safeToReplace(ar: AttributeReference, nullIsFalse: Boolean) =
 
 Review comment:
   Ok, added. Let me know if it needs rewording.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27164: [WIP][SPARK-30479][SQL] Apply 
compaction of event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-572913804
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27119: [SPARK-30447][SQL] Constant 
propagation nullability issue
URL: https://github.com/apache/spark/pull/27119#issuecomment-572913885
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 
or more grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572913895
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27119: [SPARK-30447][SQL] Constant 
propagation nullability issue
URL: https://github.com/apache/spark/pull/27119#issuecomment-572913893
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21249/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27164: [WIP][SPARK-30479][SQL] Apply 
compaction of event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-572913804
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572913902
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21250/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27164: [WIP][SPARK-30479][SQL] Apply 
compaction of event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-572913812
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21248/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27164: [WIP][SPARK-30479][SQL] Apply 
compaction of event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-572913812
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21248/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27119: [SPARK-30447][SQL] Constant 
propagation nullability issue
URL: https://github.com/apache/spark/pull/27119#issuecomment-572913893
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21249/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27119: [SPARK-30447][SQL] Constant 
propagation nullability issue
URL: https://github.com/apache/spark/pull/27119#issuecomment-572913885
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #26978: [SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project

2020-01-09 Thread GitBox
dongjoon-hyun commented on issue #26978: [SPARK-29721][SQL] Prune unnecessary 
nested fields from Generate without Project
URL: https://github.com/apache/spark/pull/26978#issuecomment-572913802
 
 
   Shall we hold on this PR a little bit until the bug of 
https://github.com/apache/spark/pull/24637 is identified and resolved?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572913895
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 
or more grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572913902
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21250/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-09 Thread GitBox
SparkQA commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of 
event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-572913400
 
 
   **[Test build #116461 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116461/testReport)**
 for PR 27164 at commit 
[`c72fd1f`](https://github.com/apache/spark/commit/c72fd1facd509a49401f1dc492227a5c2fac41ae).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572913427
 
 
   **[Test build #116463 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116463/testReport)**
 for PR 26918 at commit 
[`10e6bcd`](https://github.com/apache/spark/commit/10e6bcd2a685d86c3f3a1316f43ae08de934f54c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue

2020-01-09 Thread GitBox
SparkQA commented on issue #27119: [SPARK-30447][SQL] Constant propagation 
nullability issue
URL: https://github.com/apache/spark/pull/27119#issuecomment-572913429
 
 
   **[Test build #116462 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116462/testReport)**
 for PR 27119 at commit 
[`c8a21cc`](https://github.com/apache/spark/commit/c8a21cc29682800e79213ef371f51871e9c44551).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue

2020-01-09 Thread GitBox
maropu commented on issue #27119: [SPARK-30447][SQL] Constant propagation 
nullability issue
URL: https://github.com/apache/spark/pull/27119#issuecomment-572912992
 
 
   LGTM. Pending, Jenkins.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ayudovin commented on a change in pull request #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener

2020-01-09 Thread GitBox
ayudovin commented on a change in pull request #27030: [SPARK-30244][SQL] Emit 
pre/post events for "Partition" methods in ExternalCatalogWithListener
URL: https://github.com/apache/spark/pull/27030#discussion_r365105197
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/events.scala
 ##
 @@ -204,61 +204,64 @@ case class RenameFunctionEvent(
 newName: String)
   extends FunctionEvent
 
-trait PartitionEvent extends DatabaseEvent {
-  /**
-   * Name of the table that was touched.
-   */
-  val name: String
-}
-
+/**
+ * Event fired when a partition is created, dropped, altered or renamed.
+ */
+trait PartitionEvent extends TableEvent
 
 /**
  * Event fired before a partition is created.
  */
-case class CreatePartitionPreEvent(database: String, name: String,
-   parts: Seq[CatalogTablePartition]) extends 
PartitionEvent
+case class CreatePartitionPreEvent(
+database: String,
+name: String) extends PartitionEvent
 
 Review comment:
   > shall we put `extends PartitionEvent` into a new line
   
   yes, I'll put it into a new line.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn edited a comment on issue #27163: Revert "[SPARK-29393][SQL] Add `make_interval` function"

2020-01-09 Thread GitBox
yaooqinn edited a comment on issue #27163: Revert "[SPARK-29393][SQL] Add 
`make_interval` function"
URL: https://github.com/apache/spark/pull/27163#issuecomment-572912259
 
 
   Sorry for my rudeness and thoughtlessness. I didn't mean to hurt anybody.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression al

2020-01-09 Thread GitBox
maropu commented on a change in pull request #27058: [SPARK-30395][SQL] When 
one or more DISTINCT aggregate expressions operate on the same field, the 
DISTINCT aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#discussion_r365104343
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
 ##
 @@ -316,6 +362,86 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
 }.asInstanceOf[NamedExpression]
   }
   Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate)
+} else if (distinctAggGroups.size == 1) {
+  val (distinctAggExpressions, regularAggExpressions) = 
aggExpressions.partition(_.isDistinct)
+  if (distinctAggExpressions.exists(_.filter.isDefined)) {
+val regularAggExprs = regularAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val regularFunChildren = regularAggExprs
+  .flatMap(_.aggregateFunction.children.filter(!_.foldable))
+val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes)
+val regularAggChildren = (regularFunChildren ++ 
regularFilterAttrs).distinct
+val regularAggChildAttrMap = 
regularAggChildren.map(expressionAttributePair)
+val regularAggChildAttrLookup = regularAggChildAttrMap.toMap
+val regularOperatorMap = regularAggExprs.map {
+  case ae @ AggregateExpression(af, _, _, filter, _) =>
+val newChildren = af.children.map(c => 
regularAggChildAttrLookup.getOrElse(c, c))
+val raf = 
af.withNewChildren(newChildren).asInstanceOf[AggregateFunction]
+val filterOpt = filter.map(_.transform {
+  case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a)
+})
+val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt)
+(ae, aggExpr)
+}
+val distinctAggExprs = distinctAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map {
+  case (ae @ AggregateExpression(af, _, _, filter, _), i) =>
+// Why do we need to construct the phantom id ?
+// First, In order to reduce costs, it is better to handle the 
filter clause locally.
+// e.g. COUNT (DISTINCT a) FILTER (WHERE id > 1), evaluate 
expression
+// If(id > 1) 'a else null first, and use the result as output.
+// Second, If more than one DISTINCT aggregate expression uses the 
same column,
+// We need to construct the phantom attributes so as the output 
not lost.
+// e.g. SUM (DISTINCT a), COUNT (DISTINCT a) FILTER (WHERE id > 1) 
will output
+// attribute 'phantom1-a and attribute 'phantom2-a instead of two 
'a.
+// Note: We just need to illusion the expression with filter 
clause.
+// The illusionary mechanism may result in multiple distinct 
aggregations uses
+// different column, so we still need to call `rewrite`.
+val phantomId = i + 1
+val unfoldableChildren = af.children.filter(!_.foldable)
+val exprAttrs = unfoldableChildren.map { e =>
+  (e, AttributeReference(s"phantom$phantomId-${e.sql}", 
e.dataType, nullable = true)())
 
 Review comment:
   Can we use expr Ids (e.g., `_gen_distinct_group_${exprId.id}`) instead? Just 
like;
   
https://github.com/apache/spark/blob/afd70a0f6fc1b44164e41a57dfc4fd8a5df642e1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala#L126


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on issue #27163: Revert "[SPARK-29393][SQL] Add `make_interval` function"

2020-01-09 Thread GitBox
yaooqinn commented on issue #27163: Revert "[SPARK-29393][SQL] Add 
`make_interval` function"
URL: https://github.com/apache/spark/pull/27163#issuecomment-572912259
 
 
   Sorry for my rudeness and thoughtlessness. I didn't mean to heart somebody.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 
or more grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572911727
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116460/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572911706
 
 
   **[Test build #116460 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116460/testReport)**
 for PR 26918 at commit 
[`7a33558`](https://github.com/apache/spark/commit/7a335588ef2d5a526f21635e13fb59fd5ffd6824).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 
or more grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572911720
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
SparkQA removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or 
more grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572911095
 
 
   **[Test build #116460 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116460/testReport)**
 for PR 26918 at commit 
[`7a33558`](https://github.com/apache/spark/commit/7a335588ef2d5a526f21635e13fb59fd5ffd6824).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572911720
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572911727
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116460/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR closed pull request #27086: [WIP][SPARK-29779][SQL] Compact old event log files and cleanup - part 2

2020-01-09 Thread GitBox
HeartSaVioR closed pull request #27086: [WIP][SPARK-29779][SQL] Compact old 
event log files and cleanup - part 2
URL: https://github.com/apache/spark/pull/27086
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on issue #27086: [WIP][SPARK-29779][SQL] Compact old event log files and cleanup - part 2

2020-01-09 Thread GitBox
HeartSaVioR commented on issue #27086: [WIP][SPARK-29779][SQL] Compact old 
event log files and cleanup - part 2
URL: https://github.com/apache/spark/pull/27086#issuecomment-572911816
 
 
   replaced by #27164


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572911502
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression al

2020-01-09 Thread GitBox
maropu commented on a change in pull request #27058: [SPARK-30395][SQL] When 
one or more DISTINCT aggregate expressions operate on the same field, the 
DISTINCT aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#discussion_r365104343
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
 ##
 @@ -316,6 +362,86 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
 }.asInstanceOf[NamedExpression]
   }
   Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate)
+} else if (distinctAggGroups.size == 1) {
+  val (distinctAggExpressions, regularAggExpressions) = 
aggExpressions.partition(_.isDistinct)
+  if (distinctAggExpressions.exists(_.filter.isDefined)) {
+val regularAggExprs = regularAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val regularFunChildren = regularAggExprs
+  .flatMap(_.aggregateFunction.children.filter(!_.foldable))
+val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes)
+val regularAggChildren = (regularFunChildren ++ 
regularFilterAttrs).distinct
+val regularAggChildAttrMap = 
regularAggChildren.map(expressionAttributePair)
+val regularAggChildAttrLookup = regularAggChildAttrMap.toMap
+val regularOperatorMap = regularAggExprs.map {
+  case ae @ AggregateExpression(af, _, _, filter, _) =>
+val newChildren = af.children.map(c => 
regularAggChildAttrLookup.getOrElse(c, c))
+val raf = 
af.withNewChildren(newChildren).asInstanceOf[AggregateFunction]
+val filterOpt = filter.map(_.transform {
+  case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a)
+})
+val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt)
+(ae, aggExpr)
+}
+val distinctAggExprs = distinctAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map {
+  case (ae @ AggregateExpression(af, _, _, filter, _), i) =>
+// Why do we need to construct the phantom id ?
+// First, In order to reduce costs, it is better to handle the 
filter clause locally.
+// e.g. COUNT (DISTINCT a) FILTER (WHERE id > 1), evaluate 
expression
+// If(id > 1) 'a else null first, and use the result as output.
+// Second, If more than one DISTINCT aggregate expression uses the 
same column,
+// We need to construct the phantom attributes so as the output 
not lost.
+// e.g. SUM (DISTINCT a), COUNT (DISTINCT a) FILTER (WHERE id > 1) 
will output
+// attribute 'phantom1-a and attribute 'phantom2-a instead of two 
'a.
+// Note: We just need to illusion the expression with filter 
clause.
+// The illusionary mechanism may result in multiple distinct 
aggregations uses
+// different column, so we still need to call `rewrite`.
+val phantomId = i + 1
+val unfoldableChildren = af.children.filter(!_.foldable)
+val exprAttrs = unfoldableChildren.map { e =>
+  (e, AttributeReference(s"phantom$phantomId-${e.sql}", 
e.dataType, nullable = true)())
 
 Review comment:
   Can we use expr Ids (e.g., `_distinct_group_${exprId.id}`) instead? Just 
like;
   
https://github.com/apache/spark/blob/afd70a0f6fc1b44164e41a57dfc4fd8a5df642e1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala#L126


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27019: [SPARK-30027][SQL] Support codegen for 
aggregate filters in HashAggregateExec
URL: https://github.com/apache/spark/pull/27019#issuecomment-572911518
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21246/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item 
for limiting partition number when calculating statistics through File System
URL: https://github.com/apache/spark/pull/27129#issuecomment-572911413
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 
or more grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572911508
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21247/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27019: [SPARK-30027][SQL] Support 
codegen for aggregate filters in HashAggregateExec
URL: https://github.com/apache/spark/pull/27019#issuecomment-572911518
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21246/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for 
limiting partition number when calculating statistics through File System
URL: https://github.com/apache/spark/pull/27129#issuecomment-572911425
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21245/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27019: [SPARK-30027][SQL] Support 
codegen for aggregate filters in HashAggregateExec
URL: https://github.com/apache/spark/pull/27019#issuecomment-572911505
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 
or more grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572911502
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-09 Thread GitBox
HeartSaVioR commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction 
of event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-572911604
 
 
   Will rebase once SPARK-29779 is merged. For now, the only effective commit 
is the last, c72fd1f.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for 
limiting partition number when calculating statistics through File System
URL: https://github.com/apache/spark/pull/27129#issuecomment-572911413
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item 
for limiting partition number when calculating statistics through File System
URL: https://github.com/apache/spark/pull/27129#issuecomment-572911425
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21245/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27019: [SPARK-30027][SQL] Support codegen for 
aggregate filters in HashAggregateExec
URL: https://github.com/apache/spark/pull/27019#issuecomment-572911505
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572911508
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21247/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR opened a new pull request #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-09 Thread GitBox
HeartSaVioR opened a new pull request #27164: [WIP][SPARK-30479][SQL] Apply 
compaction of event log to SQL events
URL: https://github.com/apache/spark/pull/27164
 
 
   ### What changes were proposed in this pull request?
   
   This patch addresses adding event filter to handle SQL related events. This 
patch is next task of SPARK-29779 (#27085), please refer the description of PR 
#27085 to see overall rationalization of this patch.
   
   Below functionalities will be addressed in later parts:
   
   integrate compaction into FsHistoryProvider
   documentation about new configuration
   
   ### Why are the changes needed?
   
   One of major goal of SPARK-28594 is to prevent the event logs to become too 
huge, and SPARK-29779 achieves the goal. We've got another approach in prior, 
but the old approach required models in both KVStore and live entities to 
guarantee compatibility, while they're not designed to do so.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Added UTs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec

2020-01-09 Thread GitBox
SparkQA commented on issue #27019: [SPARK-30027][SQL] Support codegen for 
aggregate filters in HashAggregateExec
URL: https://github.com/apache/spark/pull/27019#issuecomment-572911070
 
 
   **[Test build #116458 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116458/testReport)**
 for PR 27019 at commit 
[`e449d3c`](https://github.com/apache/spark/commit/e449d3c8ca24d4ec1da2080ebb37ca2307e2ca76).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2020-01-09 Thread GitBox
SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-572911095
 
 
   **[Test build #116460 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116460/testReport)**
 for PR 26918 at commit 
[`7a33558`](https://github.com/apache/spark/commit/7a335588ef2d5a526f21635e13fb59fd5ffd6824).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System

2020-01-09 Thread GitBox
SparkQA commented on issue #27129: [SPARK-30427] Add config item for limiting 
partition number when calculating statistics through File System
URL: https://github.com/apache/spark/pull/27129#issuecomment-572911071
 
 
   **[Test build #116459 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116459/testReport)**
 for PR 27129 at commit 
[`2ff9960`](https://github.com/apache/spark/commit/2ff9960aeda3784c951b0d100d09d7081f012645).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec

2020-01-09 Thread GitBox
maropu commented on issue #27019: [SPARK-30027][SQL] Support codegen for 
aggregate filters in HashAggregateExec
URL: https://github.com/apache/spark/pull/27019#issuecomment-572910240
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for 
pyspark test test_memory_limit
URL: https://github.com/apache/spark/pull/27159#issuecomment-572909274
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116451/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] 
Parquet and ORC predicate pushdown in nested fields
URL: https://github.com/apache/spark/pull/27155#issuecomment-572909353
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21243/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27155: 
[SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested 
fields
URL: https://github.com/apache/spark/pull/27155#issuecomment-572909353
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21243/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #26997: [SPARK-30343][SQL] Skip 
unnecessary checks in RewriteDistinctAggregates
URL: https://github.com/apache/spark/pull/26997#issuecomment-572909328
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary 
checks in RewriteDistinctAggregates
URL: https://github.com/apache/spark/pull/26997#issuecomment-572909328
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27155: 
[SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested 
fields
URL: https://github.com/apache/spark/pull/27155#issuecomment-572909346
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #26997: [SPARK-30343][SQL] Skip 
unnecessary checks in RewriteDistinctAggregates
URL: https://github.com/apache/spark/pull/26997#issuecomment-572909333
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21244/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary 
checks in RewriteDistinctAggregates
URL: https://github.com/apache/spark/pull/26997#issuecomment-572909333
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21244/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for 
pyspark test test_memory_limit
URL: https://github.com/apache/spark/pull/27159#issuecomment-572909264
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] 
Parquet and ORC predicate pushdown in nested fields
URL: https://github.com/apache/spark/pull/27155#issuecomment-572909346
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark 
test test_memory_limit
URL: https://github.com/apache/spark/pull/27159#issuecomment-572909264
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark 
test test_memory_limit
URL: https://github.com/apache/spark/pull/27159#issuecomment-572909274
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116451/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
SparkQA commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test 
test_memory_limit
URL: https://github.com/apache/spark/pull/27159#issuecomment-572908996
 
 
   **[Test build #116451 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116451/testReport)**
 for PR 27159 at commit 
[`b77f9d8`](https://github.com/apache/spark/commit/b77f9d8759eddfc5c56c8de14037cdcf5dc0c989).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit

2020-01-09 Thread GitBox
SparkQA removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for 
pyspark test test_memory_limit
URL: https://github.com/apache/spark/pull/27159#issuecomment-572879942
 
 
   **[Test build #116451 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116451/testReport)**
 for PR 27159 at commit 
[`b77f9d8`](https://github.com/apache/spark/commit/b77f9d8759eddfc5c56c8de14037cdcf5dc0c989).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates

2020-01-09 Thread GitBox
SparkQA commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks 
in RewriteDistinctAggregates
URL: https://github.com/apache/spark/pull/26997#issuecomment-572908922
 
 
   **[Test build #116457 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116457/testReport)**
 for PR 26997 at commit 
[`f4eccd0`](https://github.com/apache/spark/commit/f4eccd04d9833448c71f6820c3852ce0c635e3b2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields

2020-01-09 Thread GitBox
SparkQA commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and 
ORC predicate pushdown in nested fields
URL: https://github.com/apache/spark/pull/27155#issuecomment-572908937
 
 
   **[Test build #116456 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116456/testReport)**
 for PR 27155 at commit 
[`3e78632`](https://github.com/apache/spark/commit/3e78632789afa643bc92ad425ac31de9d5afb29e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields

2020-01-09 Thread GitBox
dongjoon-hyun commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] 
Parquet and ORC predicate pushdown in nested fields
URL: https://github.com/apache/spark/pull/27155#issuecomment-572907049
 
 
   ok to test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields

2020-01-09 Thread GitBox
AmplabJenkins removed a comment on issue #27155: 
[SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested 
fields
URL: https://github.com/apache/spark/pull/27155#issuecomment-572713017
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates

2020-01-09 Thread GitBox
maropu commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in 
RewriteDistinctAggregates
URL: https://github.com/apache/spark/pull/26997#issuecomment-572907233
 
 
   cc: @dongjoon-hyun 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates

2020-01-09 Thread GitBox
maropu commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in 
RewriteDistinctAggregates
URL: https://github.com/apache/spark/pull/26997#issuecomment-572907195
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >