[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20816 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1632/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20816 **[Test build #88405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88405/testReport)** for PR 20816 at commit [`7fe9329`](https://github.com/apache/spark/commit/7fe93295df5627f2fc4e712b71aa9ce75383d410). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20816 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20852: [SPARK-23728][BRANCH-2.3] Fix ML tests with expected exc...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20852 (@attilapiros, just in case it should be manually closed) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf util in P...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20863 **[Test build #88404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88404/testReport)** for PR 20863 at commit [`3056e3c`](https://github.com/apache/spark/commit/3056e3c469209d72c97046f9668e30e0dbc5818d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf util in P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20863 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf util in P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1631/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf util in P...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20863 cc @ueshin and @BryanCutler --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf ut...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20863#discussion_r175660234 --- Diff: python/pyspark/sql/tests.py --- @@ -2409,17 +2432,13 @@ def test_join_without_on(self): df1 = self.spark.range(1).toDF("a") df2 = self.spark.range(1).toDF("b") -try: --- End diff -- Other diff are basically the same. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf ut...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20863#discussion_r175660221 --- Diff: python/pyspark/sql/tests.py --- @@ -201,6 +202,28 @@ def assertPandasEqual(self, expected, result): "\n\nResult:\n%s\n%s" % (result, result.dtypes)) self.assertTrue(expected.equals(result), msg=msg) +@contextmanager --- End diff -- This was extracted alone from https://github.com/apache/spark/commit/d6632d185e147fcbe6724545488ad80dce20277e --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf ut...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/20863 [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf util in PySpark tests where possible ## What changes were proposed in this pull request? This PR backports https://github.com/apache/spark/pull/20830 to reduce the diff against master and restore the default value back in PySpark tests. https://github.com/apache/spark/commit/d6632d185e147fcbe6724545488ad80dce20277e added an useful util. This backport extracts and brings this util: ```python @contextmanager def sql_conf(self, pairs): ... ``` to allow configuration set/unset within a block: ```python with self.sql_conf({"spark.blah.blah.blah", "blah"}) # test codes ``` This PR proposes to use this util where possible in PySpark tests. Note that there look already few places affecting tests without restoring the original value back in unittest classes. ## How was this patch tested? Likewise, manually tested via: ``` ./run-tests --modules=pyspark-sql --python-executables=python2 ./run-tests --modules=pyspark-sql --python-executables=python3 ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark backport-20830 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20863.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20863 commit 4e8045cbddc39a5b8f3488b832a1ac092da68de9 Author: hyukjinkwonDate: 2018-03-20T04:25:37Z [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests where possible https://github.com/apache/spark/commit/d6632d185e147fcbe6724545488ad80dce20277e added an useful util ```python contextmanager def sql_conf(self, pairs): ... ``` to allow configuration set/unset within a block: ```python with self.sql_conf({"spark.blah.blah.blah", "blah"}) # test codes ``` This PR proposes to use this util where possible in PySpark tests. Note that there look already few places affecting tests without restoring the original value back in unittest classes. Manually tested via: ``` ./run-tests --modules=pyspark-sql --python-executables=python2 ./run-tests --modules=pyspark-sql --python-executables=python3 ``` Author: hyukjinkwon Closes #20830 from HyukjinKwon/cleanup-sql-conf. commit 3056e3c469209d72c97046f9668e30e0dbc5818d Author: hyukjinkwon Date: 2018-03-20T05:27:26Z Extracts and brings sql_conf util --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20774 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1630/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20774 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20774 **[Test build #88403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88403/testReport)** for PR 20774 at commit [`a3dc357`](https://github.com/apache/spark/commit/a3dc35716bc73376155a6ea3594cbe575dac0c46). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20844: [SPARK-23707][SQL] Fresh 'initRange' name to avoi...
Github user ConeyLiu commented on a diff in the pull request: https://github.com/apache/spark/pull/20844#discussion_r175658889 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -396,9 +396,11 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range) // The default size of a batch, which must be positive integer val batchSize = 1000 -val initRangeFuncName = ctx.addNewFunction("initRange", +val initRange = ctx.freshName("initRange") + +val initRangeFuncName = ctx.addNewFunction(initRange, s""" -| private void initRange(int idx) { +| private void ${initRange}(int idx) { --- End diff -- Hi @cloud-fan , before adding the comments, I have a question about why we still need `exchange ` if we join two `spark.range(1, 10, 1, 1)`. Because of both of the `range` are only one partition, does the `exchange` really needed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/19819 It will create a new consumer for each thread. This could be quite resource consuming when several topics shared with thread pools. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88400/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #88400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88400/testReport)** for PR 20433 at commit [`e780fd2`](https://github.com/apache/spark/commit/e780fd2ae562cd7f9ade80cc28e0ca44f6b1cf7d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20830 A .. https://github.com/apache/spark/commit/d6632d185e147fcbe6724545488ad80dce20277e added the util into master only ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20850#discussion_r175657704 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala --- @@ -178,81 +171,76 @@ object InterpretedUnsafeProjection extends UnsafeProjectionCreator { case StructType(fields) => val numFields = fields.length -val rowWriter = new UnsafeRowWriter(bufferHolder, numFields) -val structWriter = generateStructWriter(bufferHolder, rowWriter, fields) +val rowWriter = new UnsafeRowWriter(writer, numFields) +val structWriter = generateStructWriter(rowWriter, fields) (v, i) => { - val tmpCursor = bufferHolder.cursor + rowWriter.markCursor() v.getStruct(i, fields.length) match { case row: UnsafeRow => writeUnsafeData( -bufferHolder, +rowWriter, row.getBaseObject, row.getBaseOffset, row.getSizeInBytes) case row => // Nested struct. We don't know where this will start because a row can be // variable length, so we need to update the offsets and zero out the bit mask. - rowWriter.reset() + rowWriter.resetRowWriter() structWriter.apply(row) } - writer.setOffsetAndSize(i, tmpCursor, bufferHolder.cursor - tmpCursor) + writer.setOffsetAndSizeFromMark(i) } case ArrayType(elementType, containsNull) => -val arrayWriter = new UnsafeArrayWriter -val elementSize = getElementSize(elementType) +val arrayWriter = new UnsafeArrayWriter(writer, getElementSize(elementType)) val elementWriter = generateFieldWriter( - bufferHolder, arrayWriter, elementType, containsNull) (v, i) => { - val tmpCursor = bufferHolder.cursor - writeArray(bufferHolder, arrayWriter, elementWriter, v.getArray(i), elementSize) - writer.setOffsetAndSize(i, tmpCursor, bufferHolder.cursor - tmpCursor) + arrayWriter.markCursor() --- End diff -- From the performance view, this abstraction may have more performance impact since we move temporal value on local frame into [that on Java stack](https://github.com/apache/spark/pull/20850/files#diff-e68c5a074209b9a20ee2aa42936571ceR103) ``` arrayWriter.markCursor() writeArray(arrayWriter, elementWriter, v.getArray(i)) writer.setOffsetAndSizeFromMark(i) ``` Is this implementation enough from the balance of performance and abstraction? Or, is it better to do like this? ``` val mark = arrayWriter.cursor() writeArray(arrayWriter, elementWriter, v.getArray(i)) writer.setOffsetAndSizeFromMark(i, mark) ``` @maropo @hvanhovell WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20850#discussion_r17565 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -86,11 +88,39 @@ public void grow(int neededSize) { } } - public void reset() { + byte[] buffer() { +return buffer; + } + + int getCursor() { +return cursor; + } + + void incrementCursor(int val) { +cursor += val; + } + + int pushCursor() { --- End diff -- Since one `BufferHolder` is shared by multiple `UnsafeWriter`s, it seems to be simple to store cursors into `BufferHolders`. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20830 Sure, will open a PR soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20830 Hmm, it looks like the conflict is just one block with group agg tests, probably not a big deal - you want to take a look? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20830 But I am willing to do this if you think it's better to do this. No objection. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20830 Yup, that was exactly what I thought. I think it's fine to not bother backport too since it has conflicts. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20830 The cherry pick to branch-2.3 did have some conflicts. Just to check for the reason to backport, even though this isn't a bug it's pretty safe and will help keep things inline so less conflicts for future backports? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20830 Thanks for reviewing and merging this @ueshin, @felixcheung, @BryanCutler and @dongjoon-hyun. (Just FYI, I usually manually resolve JIRAs when I accidentally failed to take an action with the merge script. I think that's fine.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88396/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20579 **[Test build #88396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88396/testReport)** for PR 20579 at commit [`3392305`](https://github.com/apache/spark/commit/339230570eab374619477f7c0d68f3451d7ff90b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20830 Merged to master! (I think it went ok..) Thanks @HyukjinKwon !! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpar...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20830 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20851#discussion_r175651639 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -72,6 +82,14 @@ private[parquet] object ParquetFilters { (n: String, v: Any) => FilterApi.notEq( binaryColumn(n), Option(v).map(b => Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull) +case DateType if SQLConf.get.parquetFilterPushDownDate => + (n: String, v: Any) => { --- End diff -- nit: ``` (n: String, v: Any) => FilterApi.notEq( intColumn(n), Option(v).map { d => DateTimeUtils.fromJavaDate(d.asInstanceOf[java.sql.Date]).asInstanceOf[Integer] }.orNull) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20851#discussion_r175649724 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -76,7 +77,9 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex expected: Seq[Row]): Unit = { val output = predicate.collect { case a: Attribute => a }.distinct -withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true") { +withSQLConf( + SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true", + SQLConf.PARQUET_FILTER_PUSHDOWN_DATE_ENABLED.key -> "true") { --- End diff -- nit: ```scala withSQLConf( SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true", SQLConf.PARQUET_FILTER_PUSHDOWN_DATE_ENABLED.key -> "true", SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") { ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20851#discussion_r175650228 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex } } + test("filter pushdown - date") { +implicit class IntToDate(int: Int) { --- End diff -- Shall we pass a string here and convert it into a date? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20862: [SPARK-23744][CORE]Fix memory leak in ReadableChannelFil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20862 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1629/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20862: [SPARK-23744][CORE]Fix memory leak in ReadableChannelFil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20862 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20862: [SPARK-23744][CORE]Fix memory leak in ReadableChannelFil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20862 **[Test build #88402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88402/testReport)** for PR 20862 at commit [`5b58c57`](https://github.com/apache/spark/commit/5b58c57607551328c893a3857717e4b159ecf841). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20862: [SPARK-23744][CORE]Fix memory leak in ReadableCha...
GitHub user 10110346 opened a pull request: https://github.com/apache/spark/pull/20862 [SPARK-23744][CORE]Fix memory leak in ReadableChannelFileRegion ## What changes were proposed in this pull request? In the class `ReadableChannelFileRegion`, the `buffer` is direct memory, we should modify `deallocate` to free it, and `deallocate` will be called by `release` ## How was this patch tested? existing unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/10110346/spark leakmem Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20862.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20862 commit 5b58c57607551328c893a3857717e4b159ecf841 Author: liuxianDate: 2018-03-20T03:19:59Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20827 **[Test build #88401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88401/testReport)** for PR 20827 at commit [`cd3dcc6`](https://github.com/apache/spark/commit/cd3dcc6299888b8119e2185fb6f79e8445631bca). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20860 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20827 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1628/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20827 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20860 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88397/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20860 **[Test build #88397 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88397/testReport)** for PR 20860 at commit [`192ce30`](https://github.com/apache/spark/commit/192ce305f05d4280c5c35b94a3666d313dab2733). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17272: [SPARK-19724][SQL]create a managed table with an existed...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/17272 @gatorsmile I will take it over :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20757: [SPARK-23595][SQL] ValidateExternalType should support i...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20757 ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #88400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88400/testReport)** for PR 20433 at commit [`e780fd2`](https://github.com/apache/spark/commit/e780fd2ae562cd7f9ade80cc28e0ca44f6b1cf7d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1627/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20861 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20861 **[Test build #88399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88399/testReport)** for PR 20861 at commit [`306dbe8`](https://github.com/apache/spark/commit/306dbe8e26f2045b0d133e07455dedae058c0311). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20861 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1626/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uui...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/20861 [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expression ## What changes were proposed in this pull request? As stated in Jira, there are problems with current `Uuid` expression which uses `java.util.UUID.randomUUID` for UUID generation. This patch uses the newly added `RandomUUIDGenerator` for UUID generation. So we can make `Uuid` deterministic between retries. ## How was this patch tested? Added unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-23599-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20861.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20861 commit 306dbe8e26f2045b0d133e07455dedae058c0311 Author: Liang-Chi HsiehDate: 2018-03-20T03:11:33Z Use new UUID generator in Uuid expression. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20796 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20796 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88394/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20796 **[Test build #88394 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88394/testReport)** for PR 20796 at commit [`5557a80`](https://github.com/apache/spark/commit/5557a80d4674e929332d9441342e5b90e314eb45). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r175646373 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala --- @@ -152,15 +152,13 @@ private[spark] object DecisionTreeMetadata extends Logging { // TODO(SPARK-9957): Handle this properly by filtering out those features. if (numCategories > 1) { // Decide if some categorical features should be treated as unordered features, --- End diff -- Change , to . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r175646335 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala --- @@ -152,15 +152,13 @@ private[spark] object DecisionTreeMetadata extends Logging { // TODO(SPARK-9957): Handle this properly by filtering out those features. if (numCategories > 1) { // Decide if some categorical features should be treated as unordered features, --- End diff -- Change , to . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20853 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88389/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20853 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20853 **[Test build #88389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88389/testReport)** for PR 20853 at commit [`8a12452`](https://github.com/apache/spark/commit/8a124522519ed4f8fb750555f1a596c9f97b6947). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20827 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20827 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88393/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20827 **[Test build #88393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88393/testReport)** for PR 20827 at commit [`bee3711`](https://github.com/apache/spark/commit/bee3711074a7d34cf19e8794f837b70eddaffbe0). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class PrettyAttribute(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20818: [SPARK-23675][WEB-UI]Title add spark logo, use sp...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20818#discussion_r175644012 --- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala --- @@ -224,6 +224,7 @@ private[spark] object UIUtils extends Logging { {commonHeaderNodes} {if (showVisualization) vizHeaderNodes else Seq.empty} {if (useDataTables) dataTablesHeaderNodes else Seq.empty} + --- End diff -- Seems it should be `prependBaseUri("/static/spark-logo-77x50px-hd.png")` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20818: [SPARK-23675][WEB-UI]Title add spark logo, use sp...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20818#discussion_r175644032 --- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala --- @@ -265,6 +266,7 @@ private[spark] object UIUtils extends Logging { {commonHeaderNodes} {if (useDataTables) dataTablesHeaderNodes else Seq.empty} + --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20687 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20687 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88391/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20687 **[Test build #88391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88391/testReport)** for PR 20687 at commit [`5926301`](https://github.com/apache/spark/commit/592630148af19adbb72703dd1ff49f82c33087d2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18666 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18666 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1625/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19001: [SPARK-19256][SQL] Hive bucketing support
Github user chrysan commented on a diff in the pull request: https://github.com/apache/spark/pull/19001#discussion_r175640958 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -184,6 +189,43 @@ case class InsertIntoHadoopFsRelationCommand( Seq.empty[Row] } + private def getBucketIdExpression(dataColumns: Seq[Attribute]): Option[Expression] = { +bucketSpec.map { spec => + val bucketColumns = spec.bucketColumnNames.map(c => dataColumns.find(_.name == c).get) + // Use `HashPartitioning.partitionIdExpression` as our bucket id expression, so that we can + // guarantee the data distribution is same between shuffle and bucketed data source, which + // enables us to only shuffle one side when join a bucketed table and a normal one. + HashPartitioning( +bucketColumns, +spec.numBuckets, +classOf[Murmur3Hash] + ).partitionIdExpression +} + } + + /** + * How is `requiredOrdering` determined ? --- End diff -- Why the definition of requiredOrdering here differs from that in InsertIntoHiveTable? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/18666 @gatorsmile would you plz take a look at this. this pr mainly want to close HiveSessionState explicitly to delete `hive.downloaded.resources.dir` which points to `"${system:java.io.tmpdir}" + File.separator + "${hive.session.id}_resources"` by default `hive.exec.local.scratchdir` which points to `"${system:java.io.tmpdir}" + File.separator + "${system:user.name}"` by default and some other dirs which used only for hive but without deleting hook on shutdown. the below code is how HiveSessionState create `hive.downloaded.resources.dir`, `isCleanUp` is set to `false`. ```scala // 3. Download resources dir path = new Path(HiveConf.getVar(conf, HiveConf.ConfVars.DOWNLOADED_RESOURCES_DIR)); createPath(conf, path, scratchDirPermission, true, **isCleanUp** = false); Plenty of unused dirs left after submit a lot of Hive supported spark applications. ![popo_2018-03-20 10-28-34](https://user-images.githubusercontent.com/8326978/37632505-7eacbec2-2c29-11e8-94b5-229ba193339f.jpg) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19108 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88398/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19108 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19108 **[Test build #88398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88398/testReport)** for PR 19108 at commit [`ccd22f5`](https://github.com/apache/spark/commit/ccd22f553a37ba166dd2881cb965edc19ff653fc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaKolmogorovSmirnovTestSuite extends SharedSparkSession ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20829 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20829 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88395/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20829 **[Test build #88395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88395/testReport)** for PR 20829 at commit [`ab91545`](https://github.com/apache/spark/commit/ab91545bebc6e1d0c5c3cb7c15156d546ad48f81). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20657 LGTM, just one small comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18666 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18666 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1624/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20657: [SPARK-23361][yarn] Allow AM to restart after ini...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20657#discussion_r175638637 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosHadoopDelegationTokenManager.scala --- @@ -105,7 +105,8 @@ private[spark] class MesosHadoopDelegationTokenManager( case e: Exception => // Log the error and try to write new tokens back in an hour logWarning("Couldn't broadcast tokens, trying again in an hour", e) --- End diff -- Shall we update the log to reflect the configured waiting hour. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/18666 @samartinucci thanks for reminding of this, i have fixed the conflicts. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20847 @mgaido91 this is already merged to branch 2.3. Please close this PR if it is not closed automatically. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20847 Thanks, merging to branch 2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19108 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1623/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20860 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20860 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1622/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20860 **[Test build #88397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88397/testReport)** for PR 20860 at commit [`192ce30`](https://github.com/apache/spark/commit/192ce305f05d4280c5c35b94a3666d313dab2733). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19108 **[Test build #88398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88398/testReport)** for PR 19108 at commit [`ccd22f5`](https://github.com/apache/spark/commit/ccd22f553a37ba166dd2881cb965edc19ff653fc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20860: [SPARK-23743][SQL] Changed a comparison logic fro...
GitHub user jongyoul opened a pull request: https://github.com/apache/spark/pull/20860 [SPARK-23743][SQL] Changed a comparison logic from containing 'slf4j' to starting with 'org.slf4j' ## What changes were proposed in this pull request? isSharedClass returns if some classes can/should be shared or not. It checks if the classes names have some keywords or start with some names. Following the logic, it can occur unintended behaviors when a custom package has `slf4j` inside the package or class name. As I guess, the first intention seems to figure out the class containing `org.slf4j`. It would be better to change the comparison logic to `name.startsWith("org.slf4j")` ## How was this patch tested? This patch should pass all of the current tests and keep all of the current behaviors. In my case, I'm using ProtobufDeserializer to get a table schema from hive tables. Thus some Protobuf packages and names have `slf4j` inside. Without this patch, it cannot be resolved because of ClassCastException from different classloaders. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jongyoul/spark SPARK-23743 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20860.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20860 commit 192ce305f05d4280c5c35b94a3666d313dab2733 Author: Jongyoul LeeDate: 2018-03-20T01:45:44Z Changed a comparison logic from containing 'slf4j' to starting with 'org.slf4j' --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20579 **[Test build #88396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88396/testReport)** for PR 20579 at commit [`3392305`](https://github.com/apache/spark/commit/339230570eab374619477f7c0d68f3451d7ff90b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1621/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20844: [SPARK-23707][SQL] Fresh 'initRange' name to avoi...
Github user ConeyLiu commented on a diff in the pull request: https://github.com/apache/spark/pull/20844#discussion_r175634287 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -396,9 +396,11 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range) // The default size of a batch, which must be positive integer val batchSize = 1000 -val initRangeFuncName = ctx.addNewFunction("initRange", +val initRange = ctx.freshName("initRange") + +val initRangeFuncName = ctx.addNewFunction(initRange, s""" -| private void initRange(int idx) { +| private void ${initRange}(int idx) { --- End diff -- OK, I can just some comments and keep the code unchanged. I changed it here just for better code robustness. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/20579 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org