[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99565/testReport)** for PR 22683 at commit [`5188c54`](https://github.com/apache/spark/commit/5188c54fcf33c24dac341c044f7ffa75c272bf52). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99564/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23088 **[Test build #99564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99564/testReport)** for PR 23088 at commit [`7956b27`](https://github.com/apache/spark/commit/7956b27bd6b19065d367d96cd5e2b448507c7dc4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/23072#discussion_r238087240 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/FPGrowthExample.scala --- @@ -64,4 +64,3 @@ object FPGrowthExample { spark.stop() } } -// scalastyle:on println --- End diff -- yes, println is not used --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23164: [SPARK-26198][SQL] Fix Metadata serialize null values th...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/23164 I used it here: https://github.com/apache/spark/compare/master...wangyum:default-value?expand=1#diff-9847f5cef7cf7fbc5830fbc6b779ee10R1827 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23120 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99563/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23120 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23120 **[Test build #99563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99563/testReport)** for PR 23120 at commit [`8f2d69d`](https://github.com/apache/spark/commit/8f2d69d848b8242c529118436249019016069ca2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99561/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23088 **[Test build #99561 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99561/testReport)** for PR 23088 at commit [`dc95355`](https://github.com/apache/spark/commit/dc9535547c353509930ea340780611f3129da962). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99562/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23088 **[Test build #99562 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99562/testReport)** for PR 23088 at commit [`64fbe5d`](https://github.com/apache/spark/commit/64fbe5d7b845b6351e2dae2af231d2be37ca13b8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23088 **[Test build #99566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99566/testReport)** for PR 23088 at commit [`2235967`](https://github.com/apache/spark/commit/2235967190ef80d363b81274f400f4a42b3556f2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5624/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23198: Branch 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23198 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23198: Branch 2.4
Github user edurekagithub closed the pull request at: https://github.com/apache/spark/pull/23198 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23198: Branch 2.4
GitHub user edurekagithub opened a pull request: https://github.com/apache/spark/pull/23198 Branch 2.4 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/edurekagithub/spark branch-2.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23198.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23198 commit b7efca7ece484ee85091b1b50bbc84ad779f9bfe Author: Mario Molina Date: 2018-09-11T12:47:14Z [SPARK-17916][SPARK-25241][SQL][FOLLOW-UP] Fix empty string being parsed as null when nullValue is set. ## What changes were proposed in this pull request? In the PR, I propose new CSV option `emptyValue` and an update in the SQL Migration Guide which describes how to revert previous behavior when empty strings were not written at all. Since Spark 2.4, empty strings are saved as `""` to distinguish them from saved `null`s. Closes #22234 Closes #22367 ## How was this patch tested? It was tested by `CSVSuite` and new tests added in the PR #22234 Closes #22389 from MaxGekk/csv-empty-value-master. Lead-authored-by: Mario Molina Co-authored-by: Maxim Gekk Signed-off-by: hyukjinkwon (cherry picked from commit c9cb393dc414ae98093c1541d09fa3c8663ce276) Signed-off-by: hyukjinkwon commit 0b8bfbe12b8a368836d7ddc8445de18b7ee42cde Author: Dongjoon Hyun Date: 2018-09-11T15:57:42Z [SPARK-25389][SQL] INSERT OVERWRITE DIRECTORY STORED AS should prevent duplicate fields ## What changes were proposed in this pull request? Like `INSERT OVERWRITE DIRECTORY USING` syntax, `INSERT OVERWRITE DIRECTORY STORED AS` should not generate files with duplicate fields because Spark cannot read those files back. **INSERT OVERWRITE DIRECTORY USING** ```scala scala> sql("INSERT OVERWRITE DIRECTORY 'file:///tmp/parquet' USING parquet SELECT 'id', 'id2' id") ... ERROR InsertIntoDataSourceDirCommand: Failed to write to directory ... org.apache.spark.sql.AnalysisException: Found duplicate column(s) when inserting into file:/tmp/parquet: `id`; ``` **INSERT OVERWRITE DIRECTORY STORED AS** ```scala scala> sql("INSERT OVERWRITE DIRECTORY 'file:///tmp/parquet' STORED AS parquet SELECT 'id', 'id2' id") // It generates corrupted files scala> spark.read.parquet("/tmp/parquet").show 18/09/09 22:09:57 WARN DataSource: Found duplicate column(s) in the data schema and the partition schema: `id`; ``` ## How was this patch tested? Pass the Jenkins with newly added test cases. Closes #22378 from dongjoon-hyun/SPARK-25389. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 77579aa8c35b0d98bbeac3c828bf68a1d190d13e) Signed-off-by: Dongjoon Hyun commit 4414e026097c74aadd252b541c9d3009cd7e9d09 Author: Gera Shegalov Date: 2018-09-11T16:28:32Z [SPARK-25221][DEPLOY] Consistent trailing whitespace treatment of conf values ## What changes were proposed in this pull request? Stop trimming values of properties loaded from a file ## How was this patch tested? Added unit test demonstrating the issue hit in production. Closes #22213 from gerashegalov/gera/SPARK-25221. Authored-by: Gera Shegalov Signed-off-by: Marcelo Vanzin (cherry picked from commit bcb9a8c83f4e6835af5dc51f1be7f964b8fa49a3) Signed-off-by: Marcelo Vanzin commit 16127e844f8334e1152b2e3ed3d878ec8de13dfa Author: Liang-Chi Hsieh Date: 2018-09-11T17:31:06Z [SPARK-24889][CORE] Update block info when unpersist rdds ## What changes were proposed in this pull request? We will update block info coming from executors, at the timing like caching a RDD. However, when removing RDDs with unpersisting, we don't ask to update block info. So the block info is not updated. We can fix this with few options: 1. Ask to update block info when unpersisting This is simplest but changes driver-executor communication a bit. 2. Update block info when processing the event of unpersisting RDD We send a `SparkListenerUnpersistRDD` event when unpersisting RDD. When processing this event, we can update block info of the RDD. This only changes event processing code
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @zsxwing Yeah, it would be ideal we can enforce `archivePath` to which don't have any possibility to match against source path (glob), so my approach was to find directory which is the base directory without having glob in ancestor, and `archive path + base directory of source path` doesn't belong to sub-directory of found directory. For example, suppose source path is `/a/b/c/*/ef?/*/g/h/*/i`, then base directory of source path would be `/a/b/c`, and `archive path + base directory of source path` should not belong to sub-directory of `/a/b/c`. (My code has a bug for finding the directory so need to fix it.) This is not an elegant approach and the approach has false-positive, ending up restricting the archive path which actually doesn't make overlap (too restrict), but it would guarantee two paths never overlap. (So no need to re-check when renaming file.) I guess the approach might be reasonable because in practice end users would avoid themselves have to think about complicated case on overlaps, and just isolate two paths. What do you think about this approach? cc. @gaborgsomogyi Could you also help validating my approach? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 Hi @pgandhi999 , It seems after your checkins, when there is no summary metrics, it is displaying empty table rather than a message which shown in the PR title. could you please help me to fix that. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99565 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99565/testReport)** for PR 22683 at commit [`5188c54`](https://github.com/apache/spark/commit/5188c54fcf33c24dac341c044f7ffa75c272bf52). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22683 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5623/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23120 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5622/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23120 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23088 **[Test build #99564 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99564/testReport)** for PR 23088 at commit [`7956b27`](https://github.com/apache/spark/commit/7956b27bd6b19065d367d96cd5e2b448507c7dc4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23120 **[Test build #99563 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99563/testReport)** for PR 23120 at commit [`8f2d69d`](https://github.com/apache/spark/commit/8f2d69d848b8242c529118436249019016069ca2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23120 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23120: [SPARK-26151][SQL] Return partial results for bad...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23120#discussion_r238083349 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala --- @@ -243,21 +243,27 @@ class UnivocityParser( () => getPartialResult(), new RuntimeException("Malformed CSV record")) } else { - try { -// When the length of the returned tokens is identical to the length of the parsed schema, -// we just need to convert the tokens that correspond to the required columns. -var i = 0 -while (i < requiredSchema.length) { + // When the length of the returned tokens is identical to the length of the parsed schema, + // we just need to convert the tokens that correspond to the required columns. + var badRecordException: Option[Throwable] = None + var i = 0 + while (i < requiredSchema.length) { +try { row(i) = valueConverters(i).apply(getToken(tokens, i)) - i += 1 +} catch { + case NonFatal(e) => +badRecordException = badRecordException.orElse(Some(e)) } +i += 1 + } + + if (badRecordException.isEmpty) { row - } catch { -case NonFatal(e) => - // For corrupted records with the number of tokens same as the schema, - // CSV reader doesn't support partial results. All fields other than the field - // configured by `columnNameOfCorruptRecord` are set to `null`. - throw BadRecordException(() => getCurrentInput, () => None, e) + } else { +// For corrupted records with the number of tokens same as the schema, +// CSV reader doesn't support partial results. All fields other than the field +// configured by `columnNameOfCorruptRecord` are set to `null`. --- End diff -- what do you mean here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23154: [SPARK-26195][SQL] Correct exception messages in ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23154 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23154: [SPARK-26195][SQL] Correct exception messages in some cl...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23154 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 Hi @srowen . Yes. Currently for disk store case, we need to have a more optimized code. > While it makes some sense I have two concerns: different answers based on disk vs memory store which shouldn't really affect things. But would a user ever have both and see both side by side and be confused? This configurable store is only for history server. So user can configure either one at a time for history server. But, the live UI (which open from Yarn UI), also goes through the same code flow. Where it has 'ElementTrackingStore', which is also inMemory. So, if a user configure disk store for History server and open both live and inProgress History UI, the summary metrics will be different. > changing the way the indexing works, so that you can index by specific metrics for successful and failed tasks differently, would be tricky, and also would require changing the disk store version (to invalidate old stores). I think @vanzin suggestion seems work, but need time to give it a try and to test it. May be we can add as "TODO" for diskStore case or open a seperate JIRA for that. > Second is, that seems like it should still entail pushing down all the quantile logic into the KVStore, to be clean, right? and that's a bigger change. Thanks @srowen for the suggestion. Probably @vanzin can answer this well. I have modified the code, for InMemory case. Disk store still uses the old code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23178 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23178#discussion_r238083055 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType * @since 1.3.0 */ @Stable --- End diff -- It's not a new API anyway, it will be weird to change since to 3.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23178: [SPARK-26216][SQL] Do not use case class as public API (...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23178 thanks for the review, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5621/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23088 **[Test build #99562 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99562/testReport)** for PR 23088 at commit [`64fbe5d`](https://github.com/apache/spark/commit/64fbe5d7b845b6351e2dae2af231d2be37ca13b8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23088 **[Test build #99561 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99561/testReport)** for PR 23088 at commit [`dc95355`](https://github.com/apache/spark/commit/dc9535547c353509930ea340780611f3129da962). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5620/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23130: [SPARK-26161][SQL] Ignore empty files in load
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23130 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23190 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23190 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5619/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23130 We don't need to block it, but @MaxGekk if you have time, it would great to answer https://github.com/apache/spark/pull/23130#issuecomment-442491582 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23190 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23190 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99560/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23187: [SPARK-26211][SQL][TEST][FOLLOW-UP] Combine test ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23187 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23190 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23190 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23187: [SPARK-26211][SQL][TEST][FOLLOW-UP] Combine test cases f...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23187 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23153: [SPARK-26147][SQL] only pull out unevaluable pyth...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23153#discussion_r238082589 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -155,19 +155,20 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper { } /** - * PythonUDF in join condition can not be evaluated, this rule will detect the PythonUDF - * and pull them out from join condition. For python udf accessing attributes from only one side, - * they are pushed down by operation push down rules. If not (e.g. user disables filter push - * down rules), we need to pull them out in this rule too. + * PythonUDF in join condition can't be evaluated if it refers to attributes from both join sides. + * See `ExtractPythonUDFs` for details. This rule will detect un-evaluable PythonUDF and pull them + * out from join condition. */ object PullOutPythonUDFInJoinCondition extends Rule[LogicalPlan] with PredicateHelper { - def hasPythonUDF(expression: Expression): Boolean = { -expression.collectFirst { case udf: PythonUDF => udf }.isDefined + + private def hasUnevaluablePythonUDF(expr: Expression, j: Join): Boolean = { +expr.find { e => + PythonUDF.isScalarPythonUDF(e) && !canEvaluate(e, j.left) && !canEvaluate(e, j.right) --- End diff -- It's only possible to have scalar UDF in join condition, so changing it to `e.isInstanceOf[PythonUDF]` is same. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23173 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99559/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23173 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23173 **[Test build #99559 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99559/testReport)** for PR 23173 at commit [`9d5cb7b`](https://github.com/apache/spark/commit/9d5cb7be9a8cb97bd54dd1e938ba819ed3066351). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23192: [SPARK-26241][SQL] Add queryId to IncrementalExec...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23192 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23193: [SPARK-26226][SQL] Track optimization phase for s...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23193 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23150 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99556/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23150 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23150 **[Test build #99556 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99556/testReport)** for PR 23150 at commit [`00509d3`](https://github.com/apache/spark/commit/00509d3a94e0679505cd9fde78e38a3a15d11bde). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99558/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23196 **[Test build #99558 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99558/testReport)** for PR 23196 at commit [`f326042`](https://github.com/apache/spark/commit/f326042aa1aff540d06c79fd73395204d846f3ea). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23190 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99557/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23190 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23190 **[Test build #99557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99557/testReport)** for PR 23190 at commit [`87c567d`](https://github.com/apache/spark/commit/87c567d6629df9042f7189f9083e26405f5fb387). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23173 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5618/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23173 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23173 **[Test build #99559 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99559/testReport)** for PR 23173 at commit [`9d5cb7b`](https://github.com/apache/spark/commit/9d5cb7be9a8cb97bd54dd1e938ba819ed3066351). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5617/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23196 **[Test build #99558 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99558/testReport)** for PR 23196 at commit [`f326042`](https://github.com/apache/spark/commit/f326042aa1aff540d06c79fd73395204d846f3ea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23173: [SPARK-26208][SQL] add headers to empty csv files...
Github user koertkuipers commented on a diff in the pull request: https://github.com/apache/spark/pull/23173#discussion_r238077135 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala --- @@ -171,15 +171,21 @@ private[csv] class CsvOutputWriter( private var univocityGenerator: Option[UnivocityGenerator] = None - override def write(row: InternalRow): Unit = { -val gen = univocityGenerator.getOrElse { - val charset = Charset.forName(params.charset) - val os = CodecStreams.createOutputStreamWriter(context, new Path(path), charset) - val newGen = new UnivocityGenerator(dataSchema, os, params) - univocityGenerator = Some(newGen) - newGen -} + if (params.headerFlag) { +val gen = getGen() +gen.writeHeaders() + } + private def getGen(): UnivocityGenerator = univocityGenerator.getOrElse { +val charset = Charset.forName(params.charset) +val os = CodecStreams.createOutputStreamWriter(context, new Path(path), charset) +val newGen = new UnivocityGenerator(dataSchema, os, params) +univocityGenerator = Some(newGen) +newGen + } + + override def write(row: InternalRow): Unit = { +val gen = getGen() --- End diff -- i will revert this change to lazy val for now since it doesnt have anything to do wit this pullreq or jira: the Option approach was created in another pullreq. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23190 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23190 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5616/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23190 **[Test build #99557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99557/testReport)** for PR 23190 at commit [`87c567d`](https://github.com/apache/spark/commit/87c567d6629df9042f7189f9083e26405f5fb387). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23150: [SPARK-26178][SQL] Use java.time API for parsing ...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/23150#discussion_r238075711 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/UnivocityParserSuite.scala --- @@ -86,62 +85,74 @@ class UnivocityParserSuite extends SparkFunSuite with SQLHelper { // null. Seq(true, false).foreach { b => val options = new CSVOptions(Map("nullValue" -> "null"), false, "GMT") - val converter = -parser.makeConverter("_1", StringType, nullable = b, options = options) + val parser = new UnivocityParser(StructType(Seq.empty), options) + val converter = parser.makeConverter("_1", StringType, nullable = b) assert(converter.apply("") == UTF8String.fromString("")) } } test("Throws exception for empty string with non null type") { - val options = new CSVOptions(Map.empty[String, String], false, "GMT") +val options = new CSVOptions(Map.empty[String, String], false, "GMT") +val parser = new UnivocityParser(StructType(Seq.empty), options) val exception = intercept[RuntimeException]{ - parser.makeConverter("_1", IntegerType, nullable = false, options = options).apply("") + parser.makeConverter("_1", IntegerType, nullable = false).apply("") } assert(exception.getMessage.contains("null value found but field _1 is not nullable.")) } test("Types are cast correctly") { val options = new CSVOptions(Map.empty[String, String], false, "GMT") -assert(parser.makeConverter("_1", ByteType, options = options).apply("10") == 10) -assert(parser.makeConverter("_1", ShortType, options = options).apply("10") == 10) -assert(parser.makeConverter("_1", IntegerType, options = options).apply("10") == 10) -assert(parser.makeConverter("_1", LongType, options = options).apply("10") == 10) -assert(parser.makeConverter("_1", FloatType, options = options).apply("1.00") == 1.0) -assert(parser.makeConverter("_1", DoubleType, options = options).apply("1.00") == 1.0) -assert(parser.makeConverter("_1", BooleanType, options = options).apply("true") == true) - -val timestampsOptions = +var parser = new UnivocityParser(StructType(Seq.empty), options) +assert(parser.makeConverter("_1", ByteType).apply("10") == 10) +assert(parser.makeConverter("_1", ShortType).apply("10") == 10) +assert(parser.makeConverter("_1", IntegerType).apply("10") == 10) +assert(parser.makeConverter("_1", LongType).apply("10") == 10) +assert(parser.makeConverter("_1", FloatType).apply("1.00") == 1.0) +assert(parser.makeConverter("_1", DoubleType).apply("1.00") == 1.0) +assert(parser.makeConverter("_1", BooleanType).apply("true") == true) + +var timestampsOptions = new CSVOptions(Map("timestampFormat" -> "dd/MM/ hh:mm"), false, "GMT") +parser = new UnivocityParser(StructType(Seq.empty), timestampsOptions) val customTimestamp = "31/01/2015 00:00" -val expectedTime = timestampsOptions.timestampFormat.parse(customTimestamp).getTime -val castedTimestamp = - parser.makeConverter("_1", TimestampType, nullable = true, options = timestampsOptions) +var format = FastDateFormat.getInstance( + timestampsOptions.timestampFormat, timestampsOptions.timeZone, timestampsOptions.locale) +val expectedTime = format.parse(customTimestamp).getTime +val castedTimestamp = parser.makeConverter("_1", TimestampType, nullable = true) .apply(customTimestamp) assert(castedTimestamp == expectedTime * 1000L) val customDate = "31/01/2015" val dateOptions = new CSVOptions(Map("dateFormat" -> "dd/MM/"), false, "GMT") -val expectedDate = dateOptions.dateFormat.parse(customDate).getTime -val castedDate = - parser.makeConverter("_1", DateType, nullable = true, options = dateOptions) -.apply(customTimestamp) -assert(castedDate == DateTimeUtils.millisToDays(expectedDate)) +parser = new UnivocityParser(StructType(Seq.empty), dateOptions) +format = FastDateFormat.getInstance( + dateOptions.dateFormat, dateOptions.timeZone, dateOptions.locale) +val expectedDate = format.parse(customDate).getTime +val castedDate = parser.makeConverter("_1", DateType, nullable = true) +.apply(customDate) +assert(castedDate == DateTimeUtils.millisToDays(expectedDate, TimeZone.getTimeZone("GMT"))) val timestamp = "2015-01-01 00:00:00" -assert(parser.makeConverter("_1", TimestampType, options = options).apply(timestamp) == - DateTimeUtils.stringToTime(timestamp).getTime * 1000L) -assert(parser.makeConverter("_1", DateType, options =
[GitHub] spark pull request #23150: [SPARK-26178][SQL] Use java.time API for parsing ...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/23150#discussion_r238075664 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -622,10 +623,11 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te val options = Map( "header" -> "true", "inferSchema" -> "false", - "dateFormat" -> "dd/MM/ hh:mm") + "dateFormat" -> "dd/MM/ HH:mm") --- End diff -- According to iso 8601: ``` h clock-hour-of-am-pm (1-12) number12 H hour-of-day (0-23) number0 ``` but real data is not in the allowed range: ``` date 26/08/2015 18:00 27/10/2014 18:30 28/01/2016 20:00 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99555/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23196 **[Test build #99555 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99555/testReport)** for PR 23196 at commit [`4646ded`](https://github.com/apache/spark/commit/4646dededae832185a35a85244baab6507d28f0d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23150: [SPARK-26178][SQL] Use java.time API for parsing ...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/23150#discussion_r238075585 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -1107,7 +,7 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te test("SPARK-18699 put malformed records in a `columnNameOfCorruptRecord` field") { Seq(false, true).foreach { multiLine => - val schema = new StructType().add("a", IntegerType).add("b", TimestampType) + val schema = new StructType().add("a", IntegerType).add("b", DateType) --- End diff -- I changed the type because supposed to valid date `"1983-08-04"` cannot be parsed with default timestamp pattern. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23150 > they pass right? is there another test you were unable to add? For now everything has been passed. I run all test localy on different timezones (set via jvm parameter `-Duser.timezone`). I updated the migration guide since I enabled new parser by default. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23150 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23150 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5615/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23150 **[Test build #99556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99556/testReport)** for PR 23150 at commit [`00509d3`](https://github.com/apache/spark/commit/00509d3a94e0679505cd9fde78e38a3a15d11bde). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23190 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99554/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23190 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23190 **[Test build #99554 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99554/testReport)** for PR 23190 at commit [`5b39ed3`](https://github.com/apache/spark/commit/5b39ed317d0da16162484cc8f8d94e0df4c2f30e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23150 It makes sense that parsing depends on a timezone, though that's set as an option in the parser typically. The tests should generally test "GMT" for this reason. If there's a default code path for when no timezone is specified, then I'd use the test harness mechanisms for temporarily changing the system timezone to GMT (which then automatically changes back). Your changes look OK here and they pass right? is there another test you were unable to add? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 cc@wzhfy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 do we need to handle this scenario? do we have any PR for handling this issue? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23173: [SPARK-26208][SQL] add headers to empty csv files...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23173#discussion_r238073218 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala --- @@ -171,15 +171,21 @@ private[csv] class CsvOutputWriter( private var univocityGenerator: Option[UnivocityGenerator] = None - override def write(row: InternalRow): Unit = { -val gen = univocityGenerator.getOrElse { - val charset = Charset.forName(params.charset) - val os = CodecStreams.createOutputStreamWriter(context, new Path(path), charset) - val newGen = new UnivocityGenerator(dataSchema, os, params) - univocityGenerator = Some(newGen) - newGen -} + if (params.headerFlag) { +val gen = getGen() +gen.writeHeaders() + } + private def getGen(): UnivocityGenerator = univocityGenerator.getOrElse { +val charset = Charset.forName(params.charset) +val os = CodecStreams.createOutputStreamWriter(context, new Path(path), charset) +val newGen = new UnivocityGenerator(dataSchema, os, params) +univocityGenerator = Some(newGen) +newGen + } + + override def write(row: InternalRow): Unit = { +val gen = getGen() --- End diff -- Yeah we have two different approaches, both of which are fine IMHO. I think it's reasonable to clean that up in a follow-up if desired. WDYT @HyukjinKwon ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23197: [SPARK-26165][Optimizer] Filter Query Date and Timestamp...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/23197 cc marmbrus yhuai srowen vinodkc --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23197: [SPARK-26165][Optimizer] Filter Query Date and Timestamp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23197 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23197: [SPARK-26165][Optimizer] Flter Query Date and Timestamp ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23197 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org