[GitHub] spark pull request #20260: [SPARK-23039][SQL] Finish TODO work in alter tabl...
Github user xubo245 closed the pull request at: https://github.com/apache/spark/pull/20260 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20260: [SPARK-23039][SQL] Finish TODO work in alter tabl...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20260#discussion_r180359425 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -787,7 +787,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat val storageWithLocation = { val tableLocation = getLocationFromStorageProps(table) // We pass None as `newPath` here, to remove the path option in storage properties. - updateLocationInStorageProps(table, newPath = None).copy( + table.storage.copy( --- End diff -- So we should remove the TODO comment/work? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20260: [SPARK-23039][SQL] Finish TODO work in alter table set l...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20260 This PR is different --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20249 Excatly, set location works its just not updating the path in the storage properties when change the path of partition i when change the path of partition in table. This PR just want to finish this TODO(gatorsmile) and add partition path to properties . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20249 It's belong to TODO work @tgravescs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20249 rebase --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20249 @gatorsmile @attilapiros Could you help to review? How to handle this PR? It's long time after being created. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20583: [SPARK-23392][TEST] Add some test cases for image...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20583#discussion_r167740107 --- Diff: mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala --- @@ -65,11 +65,71 @@ class ImageSchemaSuite extends SparkFunSuite with MLlibTestSparkContext { assert(count50 > 0 && count50 < countTotal) } + test("readImages test: recursive = false") { +val df = readImages(imagePath, null, false, 3, true, 1.0, 0) +assert(df.count() === 0) + } + + test("readImages test: read jpg image") { +val df = readImages(imagePath + "/kittens/DP153539.jpg", null, false, 3, true, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: read png image") { +val df = readImages(imagePath + "/multi-channel/BGRA.png", null, false, 3, true, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: read non image") { +val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 3, true, 1.0, 0) +assert(df.schema("image").dataType == columnSchema, "data do not fit ImageSchema") +assert(df.count() === 0) + } + + test("readImages test: read non image and dropImageFailures is false") { +val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 3, false, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: sampleRatio > 1") { +val e = intercept[IllegalArgumentException] { + readImages(imagePath, null, true, 3, true, 1.1, 0) +} +assert(e.getMessage.equals("requirement failed: sampleRatio should be between 0 and 1")) + } + + test("readImages test: sampleRatio < 0") { +val e = intercept[IllegalArgumentException] { + readImages(imagePath, null, true, 3, true, -0.1, 0) +} +assert(e.getMessage.equals("requirement failed: sampleRatio should be between 0 and 1")) --- End diff -- Thanks, done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20583: [SPARK-23392][TEST] Add some test cases for image...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20583#discussion_r167545871 --- Diff: mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala --- @@ -65,11 +65,71 @@ class ImageSchemaSuite extends SparkFunSuite with MLlibTestSparkContext { assert(count50 > 0 && count50 < countTotal) } + test("readImages test: recursive = false") { +val df = readImages(imagePath, null, false, 3, true, 1.0, 0) +assert(df.count() === 0) + } + + test("readImages test: read jpg image") { +val df = readImages(imagePath + "/kittens/DP153539.jpg", null, false, 3, true, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: read png image") { +val df = readImages(imagePath + "/multi-channel/BGRA.png", null, false, 3, true, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: read non image") { +val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 3, true, 1.0, 0) +assert(df.count() === 0) + } + + test("readImages test: read non image and dropImageFailures is false") { +val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 3, false, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: sampleRatio > 1") { +val e = intercept[IllegalArgumentException] { + readImages(imagePath, null, true, 3, true, 1.1, 0) +} +assert(e.getMessage.equals("requirement failed: sampleRatio should be between 0 and 1")) + } + + test("readImages test: sampleRatio < 0") { +val e = intercept[IllegalArgumentException] { + readImages(imagePath, null, true, 3, true, -0.1, 0) +} +assert(e.getMessage.equals("requirement failed: sampleRatio should be between 0 and 1")) + } + + test("readImages test: sampleRatio = 0") { +val df = readImages(imagePath, null, true, 3, true, 0.0, 0) +assert(df.count() === 0) + } + + test("readImages test: with sparkSession") { +val df = readImages(imagePath, sparkSession = spark, true, 3, true, 1.0, 0) --- End diff -- Can you check it? This PR not merge into branch2.3: https://github.com/apache/spark/pull/20389 I fetch the code of branch 2.3 before. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20583: [SPARK-23392][TEST] Add some test cases for image...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20583#discussion_r167481154 --- Diff: mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala --- @@ -65,11 +65,71 @@ class ImageSchemaSuite extends SparkFunSuite with MLlibTestSparkContext { assert(count50 > 0 && count50 < countTotal) } + test("readImages test: recursive = false") { +val df = readImages(imagePath, null, false, 3, true, 1.0, 0) +assert(df.count() === 0) + } + + test("readImages test: read jpg image") { +val df = readImages(imagePath + "/kittens/DP153539.jpg", null, false, 3, true, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: read png image") { +val df = readImages(imagePath + "/multi-channel/BGRA.png", null, false, 3, true, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: read non image") { +val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 3, true, 1.0, 0) +assert(df.count() === 0) + } + + test("readImages test: read non image and dropImageFailures is false") { +val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 3, false, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: sampleRatio > 1") { +val e = intercept[IllegalArgumentException] { + readImages(imagePath, null, true, 3, true, 1.1, 0) +} +assert(e.getMessage.equals("requirement failed: sampleRatio should be between 0 and 1")) + } + + test("readImages test: sampleRatio < 0") { +val e = intercept[IllegalArgumentException] { + readImages(imagePath, null, true, 3, true, -0.1, 0) +} +assert(e.getMessage.equals("requirement failed: sampleRatio should be between 0 and 1")) + } + + test("readImages test: sampleRatio = 0") { +val df = readImages(imagePath, null, true, 3, true, 0.0, 0) +assert(df.count() === 0) + } + + test("readImages test: with sparkSession") { +val df = readImages(imagePath, sparkSession = spark, true, 3, true, 1.0, 0) --- End diff -- It should be. I try it now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20583: [SPARK-23392][TEST] Add some test cases for image...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20583#discussion_r167479724 --- Diff: mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala --- @@ -65,11 +65,71 @@ class ImageSchemaSuite extends SparkFunSuite with MLlibTestSparkContext { assert(count50 > 0 && count50 < countTotal) } + test("readImages test: recursive = false") { +val df = readImages(imagePath, null, false, 3, true, 1.0, 0) +assert(df.count() === 0) + } + + test("readImages test: read jpg image") { +val df = readImages(imagePath + "/kittens/DP153539.jpg", null, false, 3, true, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: read png image") { +val df = readImages(imagePath + "/multi-channel/BGRA.png", null, false, 3, true, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: read non image") { +val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 3, true, 1.0, 0) +assert(df.count() === 0) + } + + test("readImages test: read non image and dropImageFailures is false") { +val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 3, false, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: sampleRatio > 1") { +val e = intercept[IllegalArgumentException] { + readImages(imagePath, null, true, 3, true, 1.1, 0) +} +assert(e.getMessage.equals("requirement failed: sampleRatio should be between 0 and 1")) + } + + test("readImages test: sampleRatio < 0") { +val e = intercept[IllegalArgumentException] { + readImages(imagePath, null, true, 3, true, -0.1, 0) +} +assert(e.getMessage.equals("requirement failed: sampleRatio should be between 0 and 1")) + } + + test("readImages test: sampleRatio = 0") { +val df = readImages(imagePath, null, true, 3, true, 0.0, 0) +assert(df.count() === 0) + } + + test("readImages test: with sparkSession") { +val df = readImages(imagePath, sparkSession = spark, true, 3, true, 1.0, 0) +assert(df.count() === 7) + } + test("readImages partition test") { val df = readImages(imagePath, null, true, 3, true, 1.0, 0) assert(df.rdd.getNumPartitions === 3) } + test("readImages partition test: < 0") { +val df = readImages(imagePath, null, true, -3, true, 1.0, 0) +assert(df.rdd.getNumPartitions === spark.sparkContext.defaultParallelism) + } + + test("readImages partition test: = 0") { +val df = readImages(imagePath, null, true, 0, true, 1.0, 0) +assert(df.rdd.getNumPartitions != 0) --- End diff -- Ok, done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20583: [SPARK-23392][TEST] Add some test cases for image...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20583#discussion_r167479658 --- Diff: mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala --- @@ -65,11 +65,71 @@ class ImageSchemaSuite extends SparkFunSuite with MLlibTestSparkContext { assert(count50 > 0 && count50 < countTotal) } + test("readImages test: recursive = false") { +val df = readImages(imagePath, null, false, 3, true, 1.0, 0) +assert(df.count() === 0) + } + + test("readImages test: read jpg image") { +val df = readImages(imagePath + "/kittens/DP153539.jpg", null, false, 3, true, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: read png image") { +val df = readImages(imagePath + "/multi-channel/BGRA.png", null, false, 3, true, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: read non image") { +val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 3, true, 1.0, 0) +assert(df.count() === 0) + } + + test("readImages test: read non image and dropImageFailures is false") { +val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 3, false, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: sampleRatio > 1") { +val e = intercept[IllegalArgumentException] { + readImages(imagePath, null, true, 3, true, 1.1, 0) +} +assert(e.getMessage.equals("requirement failed: sampleRatio should be between 0 and 1")) + } + + test("readImages test: sampleRatio < 0") { +val e = intercept[IllegalArgumentException] { + readImages(imagePath, null, true, 3, true, -0.1, 0) +} +assert(e.getMessage.equals("requirement failed: sampleRatio should be between 0 and 1")) + } + + test("readImages test: sampleRatio = 0") { +val df = readImages(imagePath, null, true, 3, true, 0.0, 0) +assert(df.count() === 0) + } + + test("readImages test: with sparkSession") { +val df = readImages(imagePath, sparkSession = spark, true, 3, true, 1.0, 0) +assert(df.count() === 7) + } + test("readImages partition test") { val df = readImages(imagePath, null, true, 3, true, 1.0, 0) assert(df.rdd.getNumPartitions === 3) } + test("readImages partition test: < 0") { +val df = readImages(imagePath, null, true, -3, true, 1.0, 0) +assert(df.rdd.getNumPartitions === spark.sparkContext.defaultParallelism) + } + + test("readImages partition test: = 0") { --- End diff -- test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20583: [SPARK-23392][TEST] Add some test cases for image...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20583#discussion_r167479566 --- Diff: mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala --- @@ -65,11 +65,71 @@ class ImageSchemaSuite extends SparkFunSuite with MLlibTestSparkContext { assert(count50 > 0 && count50 < countTotal) } + test("readImages test: recursive = false") { +val df = readImages(imagePath, null, false, 3, true, 1.0, 0) +assert(df.count() === 0) + } + + test("readImages test: read jpg image") { +val df = readImages(imagePath + "/kittens/DP153539.jpg", null, false, 3, true, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: read png image") { +val df = readImages(imagePath + "/multi-channel/BGRA.png", null, false, 3, true, 1.0, 0) +assert(df.count() === 1) + } + + test("readImages test: read non image") { +val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 3, true, 1.0, 0) +assert(df.count() === 0) + } + + test("readImages test: read non image and dropImageFailures is false") { +val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 3, false, 1.0, 0) +assert(df.count() === 1) --- End diff -- Ok, done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20583: [SPARK-23392][TEST] Add some test cases for images featu...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20583 Sorry, done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20583: [CARBONDATA-23392][TEST] Add some test cases for ...
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/20583 [CARBONDATA-23392][TEST] Add some test cases for images feature ## What changes were proposed in this pull request? Add some test cases for images feature ## How was this patch tested? Add some test cases in ImageSchemaSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark CARBONDATA23392_AddTestForImage Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20583.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20583 commit bc375112274d862de536841208d6e7cda151afe2 Author: xubo245 <601450868@...> Date: 2018-02-12T03:28:41Z [CARBONDATA-23392][TEST] Add some test case for images feature --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20456#discussion_r166162010 --- Diff: python/pyspark/sql/dataframe.py --- @@ -667,6 +667,51 @@ def repartition(self, numPartitions, *cols): else: raise TypeError("numPartitions should be an int or Column") +@since("2.3.0") --- End diff -- ok, rename to 2.4.0, done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20456#discussion_r165838751 --- Diff: python/pyspark/sql/dataframe.py --- @@ -667,6 +667,51 @@ def repartition(self, numPartitions, *cols): else: raise TypeError("numPartitions should be an int or Column") +@since("2.3.0") +def repartitionByRange(self, numPartitions, *cols): +""" +Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The +resulting DataFrame is range partitioned. + +``numPartitions`` can be an int to specify the target number of partitions or a Column. +If it is a Column, it will be used as the first partitioning column. If not specified, +the default number of partitions is used. + +At least one partition-by expression must be specified. +When no explicit sort order is specified, "ascending nulls first" is assumed. + +>>> df.repartitionByRange(2, "age").rdd.getNumPartitions() +2 +>>> df.show() ++---+-+ +|age| name| ++---+-+ +| 2|Alice| +| 5| Bob| ++---+-+ +>>> df.repartitionByRange(1, "age").rdd.getNumPartitions() +1 +>>> data = df.repartitionByRange("age") +>>> df.show() ++---+-+ +|age| name| ++---+-+ +| 2|Alice| +| 5| Bob| ++---+-+ +""" +if isinstance(numPartitions, int): +if len(cols) == 0: +return ValueError("At least one partition-by expression must be specified.") +else: +return DataFrame( +self._jdf.repartitionByRange(numPartitions, self._jcols(*cols)), self.sql_ctx) +elif isinstance(numPartitions, (basestring, Column)): +cols = (numPartitions,) + cols +return DataFrame(self._jdf.repartitionByRange(self._jcols(*cols)), self.sql_ctx) +else: +raise TypeError("numPartitions should be an int or Column") --- End diff -- ok, done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20456#discussion_r165836612 --- Diff: python/pyspark/sql/dataframe.py --- @@ -667,6 +667,92 @@ def repartition(self, numPartitions, *cols): else: raise TypeError("numPartitions should be an int or Column") +@since("2.3.0") +def repartitionByRange(self, numPartitions, *cols): +""" +Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The +resulting DataFrame is range partitioned. + +``numPartitions`` can be an int to specify the target number of partitions or a Column. +If it is a Column, it will be used as the first partitioning column. If not specified, +the default number of partitions is used. + +At least one partition-by expression must be specified. +When no explicit sort order is specified, "ascending nulls first" is assumed. + +>>> df.repartitionByRange(2, "age").rdd.getNumPartitions() +2 +>>> df.show() ++---+-+ +|age| name| ++---+-+ +| 2|Alice| +| 5| Bob| ++---+-+ +>>> df.repartitionByRange(1, "age").rdd.getNumPartitions() +1 +>>> data = df.union(df) +>>> data.show() ++---+-+ +|age| name| ++---+-+ +| 2|Alice| +| 5| Bob| +| 2|Alice| +| 5| Bob| ++---+-+ +>>> data = data.repartitionByRange(3, "age") +>>> data.show() ++---+-+ +|age| name| ++---+-+ +| 2|Alice| +| 2|Alice| +| 5| Bob| +| 5| Bob| ++---+-+ +>>> data.rdd.getNumPartitions() +3 +>>> data = data.repartitionByRange("age") +>>> data.rdd.getNumPartitions() +3 +>>> data2 = df.union(df).union(df) --- End diff -- ok, remove union --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20456#discussion_r165814225 --- Diff: python/pyspark/sql/dataframe.py --- @@ -667,6 +667,92 @@ def repartition(self, numPartitions, *cols): else: raise TypeError("numPartitions should be an int or Column") +@since("2.3.0") +def repartitionByRange(self, numPartitions, *cols): +""" +Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The +resulting DataFrame is range partitioned. + +``numPartitions`` can be an int to specify the target number of partitions or a Column. +If it is a Column, it will be used as the first partitioning column. If not specified, +the default number of partitions is used. + +At least one partition-by expression must be specified. +When no explicit sort order is specified, "ascending nulls first" is assumed. + +>>> df.repartitionByRange(2, "age").rdd.getNumPartitions() +2 +>>> df.show() ++---+-+ +|age| name| ++---+-+ +| 2|Alice| +| 5| Bob| ++---+-+ +>>> df.repartitionByRange(1, "age").rdd.getNumPartitions() +1 +>>> data = df.union(df) +>>> data.show() ++---+-+ +|age| name| ++---+-+ +| 2|Alice| +| 5| Bob| +| 2|Alice| +| 5| Bob| ++---+-+ +>>> data = data.repartitionByRange(3, "age") +>>> data.show() ++---+-+ +|age| name| ++---+-+ +| 2|Alice| +| 2|Alice| +| 5| Bob| +| 5| Bob| ++---+-+ +>>> data.rdd.getNumPartitions() +3 +>>> data = data.repartitionByRange("age") +>>> data.rdd.getNumPartitions() +3 +>>> data2 = df.union(df).union(df) --- End diff -- How to test data after repartitionByRange("age") --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20456#discussion_r165814229 --- Diff: python/pyspark/sql/dataframe.py --- @@ -667,6 +667,92 @@ def repartition(self, numPartitions, *cols): else: raise TypeError("numPartitions should be an int or Column") +@since("2.3.0") +def repartitionByRange(self, numPartitions, *cols): +""" +Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The +resulting DataFrame is range partitioned. + +``numPartitions`` can be an int to specify the target number of partitions or a Column. +If it is a Column, it will be used as the first partitioning column. If not specified, +the default number of partitions is used. + +At least one partition-by expression must be specified. +When no explicit sort order is specified, "ascending nulls first" is assumed. + +>>> df.repartitionByRange(2, "age").rdd.getNumPartitions() +2 +>>> df.show() ++---+-+ +|age| name| ++---+-+ +| 2|Alice| +| 5| Bob| ++---+-+ +>>> df.repartitionByRange(1, "age").rdd.getNumPartitions() +1 +>>> data = df.union(df) --- End diff -- How to test data after repartitionByRange("age")? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20456: [SPARK-22624][PYSPARK] Expose range partitioning shuffle...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20456 @gatorsmile @HyukjinKwon Please review it again, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20249 @gatorsmile @dongjoon-hyun please review it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20456#discussion_r165643385 --- Diff: python/pyspark/sql/dataframe.py --- @@ -667,6 +667,55 @@ def repartition(self, numPartitions, *cols): else: raise TypeError("numPartitions should be an int or Column") +@since("2.3.0") +def repartitionByRange(self, numPartitions, *cols, **kwargs): +""" +Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The +resulting DataFrame is range partitioned. + +``numPartitions`` can be an int to specify the target number of partitions or a Column. +If it is a Column, it will be used as the first partitioning column. If not specified, +the default number of partitions is used. --- End diff -- ok, done, please review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20456#discussion_r165643349 --- Diff: python/pyspark/sql/dataframe.py --- @@ -667,6 +667,55 @@ def repartition(self, numPartitions, *cols): else: raise TypeError("numPartitions should be an int or Column") +@since("2.3.0") +def repartitionByRange(self, numPartitions, *cols, **kwargs): +""" +Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The +resulting DataFrame is range partitioned. + +``numPartitions`` can be an int to specify the target number of partitions or a Column. +If it is a Column, it will be used as the first partitioning column. If not specified, +the default number of partitions is used. + +At least one partition-by expression must be specified. +When no explicit sort order is specified, "ascending nulls first" is assumed. + +>>> df.repartitionByRange(2, "age").rdd.getNumPartitions() +2 +>>> data = df.union(df).repartition(1, "age") +>>> data.rdd.getNumPartitions() +1 +>>> data.show() ++---+-+ +|age| name| ++---+-+ +| 2|Alice| +| 5| Bob| +| 2|Alice| +| 5| Bob| ++---+-+ +>>> data = data.repartitionByRange(3, "age") +>>> data.show() ++---+-+ +|age| name| ++---+-+ +| 2|Alice| +| 2|Alice| +| 5| Bob| +| 5| Bob| ++---+-+ +>>> data.rdd.getNumPartitions() +3 +""" +if isinstance(numPartitions, int): +if len(cols) == 0: +return ValueError("At least one partition-by expression must be specified.") +else: +return DataFrame( +self._jdf.repartitionByRange(numPartitions, self._jcols(*cols)), self.sql_ctx) +else: --- End diff -- ok,done, please review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20456#discussion_r165638989 --- Diff: python/pyspark/sql/dataframe.py --- @@ -667,6 +667,55 @@ def repartition(self, numPartitions, *cols): else: raise TypeError("numPartitions should be an int or Column") +@since("2.3.0") +def repartitionByRange(self, numPartitions, *cols, **kwargs): +""" +Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The +resulting DataFrame is range partitioned. + +``numPartitions`` can be an int to specify the target number of partitions or a Column. +If it is a Column, it will be used as the first partitioning column. If not specified, +the default number of partitions is used. + +At least one partition-by expression must be specified. +When no explicit sort order is specified, "ascending nulls first" is assumed. + +>>> df.repartitionByRange(2, "age").rdd.getNumPartitions() +2 +>>> data = df.union(df).repartition(1, "age") +>>> data.rdd.getNumPartitions() +1 +>>> data.show() ++---+-+ +|age| name| ++---+-+ +| 2|Alice| +| 5| Bob| +| 2|Alice| +| 5| Bob| ++---+-+ +>>> data = data.repartitionByRange(3, "age") +>>> data.show() ++---+-+ +|age| name| ++---+-+ +| 2|Alice| +| 2|Alice| +| 5| Bob| +| 5| Bob| ++---+-+ +>>> data.rdd.getNumPartitions() +3 +""" +if isinstance(numPartitions, int): +if len(cols) == 0: +return ValueError("At least one partition-by expression must be specified.") +else: +return DataFrame( +self._jdf.repartitionByRange(numPartitions, self._jcols(*cols)), self.sql_ctx) +else: --- End diff -- throw typeErrorï¼ how to handle in test case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20456#discussion_r165627387 --- Diff: python/pyspark/sql/dataframe.py --- @@ -667,6 +667,55 @@ def repartition(self, numPartitions, *cols): else: raise TypeError("numPartitions should be an int or Column") +@since("2.3.0") +def repartitionByRange(self, numPartitions, *cols, **kwargs): --- End diff -- sorryï¼ unused, remove it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20456#discussion_r165627994 --- Diff: python/pyspark/sql/dataframe.py --- @@ -667,6 +667,55 @@ def repartition(self, numPartitions, *cols): else: raise TypeError("numPartitions should be an int or Column") +@since("2.3.0") +def repartitionByRange(self, numPartitions, *cols, **kwargs): +""" +Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The +resulting DataFrame is range partitioned. + +``numPartitions`` can be an int to specify the target number of partitions or a Column. +If it is a Column, it will be used as the first partitioning column. If not specified, +the default number of partitions is used. + +At least one partition-by expression must be specified. +When no explicit sort order is specified, "ascending nulls first" is assumed. + +>>> df.repartitionByRange(2, "age").rdd.getNumPartitions() +2 +>>> data = df.union(df).repartition(1, "age") --- End diff -- ok, change it to repartitionByRange --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20456: [SPARK-22624][PYSPARK] Expose range partitioning shuffle...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20456 @gatorsmile please review it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/20456 [SPARK-22624][PYSPARK] Expose range partitioning shuffle introduced by spark-22614 ## What changes were proposed in this pull request? Expose range partitioning shuffle introduced by spark-22614 ## How was this patch tested? Unit test in dataframe.py Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark SPARK22624_PysparkRangePartition Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20456.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20456 commit 8aaeee331df8b57a36238212eecf238e0c093d93 Author: xubo245 <601450868@...> Date: 2018-01-31T14:29:50Z [SPARK-22624][PYSPARK] Expose range partitioning shuffle introduced by spark-22614 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20250: [SPARK-23059][SQL][TEST] Correct some improper with view...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20250 Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20249 @gatorsmile Please review it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20250: [SPARK-23059][SQL][TEST] Correct some improper with view...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20250 @gatorsmile @jiangxb1987 @dongjoon-hyun Can this PR be merged ? I will fix if it has problem --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20260: [SPARK-23039][SQL] Finish TODO work in alter table set l...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20260 I will fix the error of this PR after https://github.com/apache/spark/pull/20249#issuecomment-358720962 merged --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20249 @gatorsmile Tests pass, please review it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20227: [SPARK-23035][SQL] Fix improper information of Te...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20227#discussion_r161981394 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala --- @@ -31,7 +31,7 @@ class TableAlreadyExistsException(db: String, table: String) extends AnalysisException(s"Table or view '$table' already exists in database '$db'") class TempTableAlreadyExistsException(table: String) --- End diff -- I think we should rename it. But @gatorsmile said "We do not want to introduce a new exception type. In contrast, we planned to remove all these exception sub-types because PySpark might output a confusing error message.". So I revert it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION shou...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20249#discussion_r161648924 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -751,6 +751,25 @@ class HiveDDLSuite } } + test("SPARK-23057: SET LOCATION should change the path of partition in table") { +withTable("boxes") { + sql("CREATE TABLE boxes (height INT, length INT) PARTITIONED BY (width INT)") + sql("INSERT OVERWRITE TABLE boxes PARTITION (width=4) SELECT 4, 4") + val expected = "/path/to/part/ways" + sql(s"ALTER TABLE boxes PARTITION (width=4) SET LOCATION '$expected'") + val catalog = spark.sessionState.catalog --- End diff -- Good idea, I will try --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION shou...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20249#discussion_r161646360 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1869,6 +1869,65 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } + test("SPARK-23057: SET LOCATION for managed table with partition") { +withTable("tbl_partition") { + withTempDir { dir => +sql("CREATE TABLE tbl_partition(col1 INT, col2 INT) USING parquet PARTITIONED BY (col1)") +sql("INSERT INTO tbl_partition PARTITION(col1=1) SELECT 11") +sql("INSERT INTO tbl_partition PARTITION(col1=2) SELECT 22") +checkAnswer(spark.table("tbl_partition"), Seq(Row(11, 1), Row(22, 2))) +val defaultTablePath = spark.sessionState.catalog + .getTableMetadata(TableIdentifier("tbl_partition")).storage.locationUri.get +try { + // before set location of partition col1 =1 and 2 + checkPath(defaultTablePath.toString, Map("col1" -> "1"), "tbl_partition") + checkPath(defaultTablePath.toString, Map("col1" -> "2"), "tbl_partition") + val path = dir.getCanonicalPath + + // set location of partition col1 =1 + sql(s"ALTER TABLE tbl_partition PARTITION (col1='1') SET LOCATION '$path'") + checkPath(dir.getCanonicalPath, Map("col1" -> "1"), "tbl_partition") --- End diff -- ok, done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION shou...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20249#discussion_r161645520 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -800,6 +802,15 @@ case class AlterTableSetLocationCommand( CommandUtils.updateTableStats(sparkSession, table) Seq.empty[Row] } + + private def updatePathInProps( + storage: CatalogStorageFormat, + newPath: Option[String]): Map[String, String] = { --- End diff -- ok, done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION shou...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20249#discussion_r161645457 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1869,6 +1869,65 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } + test("SPARK-23057: SET LOCATION for managed table with partition") { +withTable("tbl_partition") { + withTempDir { dir => +sql("CREATE TABLE tbl_partition(col1 INT, col2 INT) USING parquet PARTITIONED BY (col1)") +sql("INSERT INTO tbl_partition PARTITION(col1=1) SELECT 11") +sql("INSERT INTO tbl_partition PARTITION(col1=2) SELECT 22") +checkAnswer(spark.table("tbl_partition"), Seq(Row(11, 1), Row(22, 2))) +val defaultTablePath = spark.sessionState.catalog + .getTableMetadata(TableIdentifier("tbl_partition")).storage.locationUri.get +try { + // before set location of partition col1 =1 and 2 + checkPath(defaultTablePath.toString, Map("col1" -> "1"), "tbl_partition") + checkPath(defaultTablePath.toString, Map("col1" -> "2"), "tbl_partition") + val path = dir.getCanonicalPath + + // set location of partition col1 =1 + sql(s"ALTER TABLE tbl_partition PARTITION (col1='1') SET LOCATION '$path'") + checkPath(dir.getCanonicalPath, Map("col1" -> "1"), "tbl_partition") + checkPath(defaultTablePath.toString, Map("col1" -> "2"), "tbl_partition") + + // set location of partition col1 =2 + sql(s"ALTER TABLE tbl_partition PARTITION (col1='2') SET LOCATION '$path'") + checkPath(dir.getCanonicalPath, Map("col1" -> "1"), "tbl_partition") + checkPath(dir.getCanonicalPath, Map("col1" -> "2"), "tbl_partition") + + spark.catalog.refreshTable("tbl_partition") + // SET LOCATION won't move data from previous table path to new table path. + assert(spark.table("tbl_partition").count() == 0) --- End diff -- ok, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION shou...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20249#discussion_r161645330 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1869,6 +1869,65 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } + test("SPARK-23057: SET LOCATION for managed table with partition") { +withTable("tbl_partition") { + withTempDir { dir => +sql("CREATE TABLE tbl_partition(col1 INT, col2 INT) USING parquet PARTITIONED BY (col1)") +sql("INSERT INTO tbl_partition PARTITION(col1=1) SELECT 11") +sql("INSERT INTO tbl_partition PARTITION(col1=2) SELECT 22") +checkAnswer(spark.table("tbl_partition"), Seq(Row(11, 1), Row(22, 2))) +val defaultTablePath = spark.sessionState.catalog + .getTableMetadata(TableIdentifier("tbl_partition")).storage.locationUri.get +try { + // before set location of partition col1 =1 and 2 + checkPath(defaultTablePath.toString, Map("col1" -> "1"), "tbl_partition") + checkPath(defaultTablePath.toString, Map("col1" -> "2"), "tbl_partition") + val path = dir.getCanonicalPath + + // set location of partition col1 =1 + sql(s"ALTER TABLE tbl_partition PARTITION (col1='1') SET LOCATION '$path'") + checkPath(dir.getCanonicalPath, Map("col1" -> "1"), "tbl_partition") + checkPath(defaultTablePath.toString, Map("col1" -> "2"), "tbl_partition") + + // set location of partition col1 =2 + sql(s"ALTER TABLE tbl_partition PARTITION (col1='2') SET LOCATION '$path'") + checkPath(dir.getCanonicalPath, Map("col1" -> "1"), "tbl_partition") + checkPath(dir.getCanonicalPath, Map("col1" -> "2"), "tbl_partition") + + spark.catalog.refreshTable("tbl_partition") + // SET LOCATION won't move data from previous table path to new table path. + assert(spark.table("tbl_partition").count() == 0) + // the previous table path should be still there. + assert(new File(defaultTablePath).exists()) + + sql("INSERT INTO tbl_partition PARTITION(col1=2) SELECT 33") + // newly inserted data will go to the new table path. + assert(dir.listFiles().nonEmpty) + + sql("DROP TABLE tbl_partition") + // the new table path will be removed after DROP TABLE. + assert(!dir.exists()) +} finally { + Utils.deleteRecursively(new File(defaultTablePath)) +} + } +} + } + + def checkPath(path: String, partSpec: Map[String, String], table: String): Unit = { +val catalog = spark.sessionState.catalog +val spec = Some(partSpec) --- End diff -- ok, done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20250: [SPARK-23059][SQL][TEST] Correct some improper with view...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20250 @dongjoon-hyun @gatorsmile Please review it. thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20227: [SPARK-23035][SQL] Fix improper information of TempTable...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20227 @dongjoon-hyun ok ,done. Jira webpage shows "Maintenance in progress" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20227: [SPARK-23035][SQL] Fix warning: TEMPORARY TABLE ... USIN...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20227 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20227: [SPARK-23035][SQL] Fix warning: TEMPORARY TABLE ....
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20227#discussion_r161386989 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -883,6 +908,41 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } +test("rename temporary view - destination table already exists, with sql: CREATE TEMPORARY view") { --- End diff -- okï¼ done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20260: [SPARK-23039][SQL] Finish TODO work in alter table set l...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20260 @gatorsmile Please review it. This is your TODO work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20260: [SPARK-23039][SQL] Finish TODO work in alter tabl...
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/20260 [SPARK-23039][SQL] Finish TODO work in alter table set location. ## What changes were proposed in this pull request? Finish TODO work in alter table set location. org.apache.spark.sql.execution.command.DDLSuite#testSetLocation // TODO(gatorsmile): fix the bug in alter table set location. //if (isUsingHiveMetastore) { //assert(storageFormat.properties.get("path") === expected) // } fix it by remove newPath = None in org.apache.spark.sql.hive.HiveExternalCatalog#restoreDataSourceTable ## How was this patch tested? test("SPARK-23039: check path after SET LOCATION") Wait for https://github.com/apache/spark/pull/20249 You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark setLocationTODO Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20260.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20260 commit 76c1813cf6e0e0e0d085cd31dcf1633c80829eff Author: xubo245 <601450868@...> Date: 2018-01-13T13:53:52Z [SPARK-23039][SQL] Fix the bug in alter table set location. TOBO work: Fix the bug in alter table set location. org.apache.spark.sql.execution.command.DDLSuite#testSetLocation // TODO(gatorsmile): fix the bug in alter table set location. //if (isUsingHiveMetastore) { //assert(storageFormat.properties.get("path") === expected) // } --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20227: [SPARK-23035][SQL] Fix warning: TEMPORARY TABLE ....
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20227#discussion_r161367362 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -814,7 +814,7 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { withTempView("tab1") { sql( """ - |CREATE TEMPORARY TABLE tab1 --- End diff -- Ok, done. I keep the old test cases for test coverage, and add new test cases for temp view --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20227: [SPARK-23035][SQL] Fix warning: TEMPORARY TABLE ....
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20227#discussion_r161367322 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala --- @@ -33,6 +33,9 @@ class TableAlreadyExistsException(db: String, table: String) class TempTableAlreadyExistsException(table: String) extends AnalysisException(s"Temporary table '$table' already exists") +class TempViewAlreadyExistsException(table: String) + extends AnalysisException(s"Temporary view '$table' already exists") --- End diff -- ok, I will remove new exception --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20227: [SPARK-23035][SQL] Fix warning: TEMPORARY TABLE ....
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20227#discussion_r161366875 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala --- @@ -33,6 +33,9 @@ class TableAlreadyExistsException(db: String, table: String) class TempTableAlreadyExistsException(table: String) extends AnalysisException(s"Temporary table '$table' already exists") --- End diff -- How about the class name? TempTableAlreadyExistsException --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20228: [SPARK-23036][SQL][TEST] Add withGlobalTempView for test...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20228 @gatorsmile Sure. This is only for TEST. Done , I put '[SQL]' into title too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20228: [SPARK-23036][SQL][TEST] Add withGlobalTempView for test...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20228 ok, done, I put '[SQL]' into title. @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20250: [SPARK-23059][SQL][TEST] Correct some improper with view...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20250 @dongjoon-hyun Please review it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20228: [SPARK-23036][SQL][TEST] Add withGlobalTempView f...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20228#discussion_r161253370 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/GlobalTempViewSuite.scala --- @@ -140,8 +140,8 @@ class GlobalTempViewSuite extends QueryTest with SharedSQLContext { assert(spark.catalog.listTables(globalTempDB).collect().toSeq.map(_.name) == Seq("v1", "v2")) } finally { - spark.catalog.dropTempView("v1") - spark.catalog.dropGlobalTempView("v2") + spark.catalog.dropGlobalTempView("v1") + spark.catalog.dropTempView("v2") --- End diff -- Ok, done. Please review: https://github.com/apache/spark/pull/20250 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20250: [SPARK-23059][SQL][TEST] Correct some improper wi...
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/20250 [SPARK-23059][SQL][TEST] Correct some improper with view related method usage ## What changes were proposed in this pull request? Correct some improper with view related method usage Only change test cases like: ``` test("list global temp views") { try { sql("CREATE GLOBAL TEMP VIEW v1 AS SELECT 3, 4") sql("CREATE TEMP VIEW v2 AS SELECT 1, 2") checkAnswer(sql(s"SHOW TABLES IN $globalTempDB"), Row(globalTempDB, "v1", true) :: Row("", "v2", true) :: Nil) assert(spark.catalog.listTables(globalTempDB).collect().toSeq.map(_.name) == Seq("v1", "v2")) } finally { spark.catalog.dropTempView("v1") spark.catalog.dropGlobalTempView("v2") } } ``` ## How was this patch tested? See test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark DropTempViewError Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20250.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20250 commit 8e49445f68db89b8a01b3eeb9c6da74191bc9a86 Author: xubo245 <601450868@...> Date: 2018-01-12T15:34:32Z [SPARK-23059][SQL][TEST] Correct some improper with view related method usage --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20249 @gatorsmile Please review it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION shou...
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/20249 [SPARK-23057][SPARK-19235][SQL] SET LOCATION should change the path of partition in table ## What changes were proposed in this pull request? Fix error of SE T LOCATION SET LOCATION should change the path of partition in table ## How was this patch tested? add test cases: test("SPARK-23057: path option always represent the value of table location with partition") test("SPARK-23057: SET LOCATION for managed table with partition") test("SPARK-23057: SET LOCATION should change the path of partition in table") You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark setPartitionPath Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20249.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20249 commit ff21db1cc3ccdef8b1028583ecb11ca0e27c2e7d Author: xubo245 <601450868@...> Date: 2018-01-12T14:52:21Z [SPARK-23057][SPARK-19235][SQL] SET LOCATION should change the path of partition in table --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in ...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/16592#discussion_r160889706 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1082,24 +1173,21 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { convertToDatasourceTable(catalog, tableIdent) } assert(catalog.getTableMetadata(tableIdent).storage.locationUri.isDefined) -assert(catalog.getTableMetadata(tableIdent).storage.properties.isEmpty) + assert(normalizeSerdeProp(catalog.getTableMetadata(tableIdent).storage.properties).isEmpty) assert(catalog.getPartition(tableIdent, partSpec).storage.locationUri.isDefined) -assert(catalog.getPartition(tableIdent, partSpec).storage.properties.isEmpty) +assert( + normalizeSerdeProp(catalog.getPartition(tableIdent, partSpec).storage.properties).isEmpty) + // Verify that the location is set to the expected string def verifyLocation(expected: URI, spec: Option[TablePartitionSpec] = None): Unit = { val storageFormat = spec .map { s => catalog.getPartition(tableIdent, s).storage } .getOrElse { catalog.getTableMetadata(tableIdent).storage } - if (isDatasourceTable) { -if (spec.isDefined) { - assert(storageFormat.properties.isEmpty) - assert(storageFormat.locationUri === Some(expected)) -} else { - assert(storageFormat.locationUri === Some(expected)) -} - } else { -assert(storageFormat.locationUri === Some(expected)) - } + // TODO(gatorsmile): fix the bug in alter table set location. + // if (isUsingHiveMetastore) { + // assert(storageFormat.properties.get("path") === expected) --- End diff -- Do we need to fix this bug and satify this test case? When porting these test cases, a bug of SET LOCATION is found. path is not set when the location is changed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20195: [SPARK-22972][SQL] Couldn't find corresponding Hive SerD...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20195 ok @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20195: [SPARK-22972][SQL] Couldn't find corresponding Hi...
Github user xubo245 closed the pull request at: https://github.com/apache/spark/pull/20195 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20228: [SPARK-23036] Add withGlobalTempView for testing ...
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/20228 [SPARK-23036] Add withGlobalTempView for testing and correct some roper with view related method usage ## What changes were proposed in this pull request? Add withGlobalTempView when create global temp view, like withTempView and withView. And correct some improper usage. ## How was this patch tested? no new test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark DropTempView Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20228.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20228 commit fffd109e8c084f9a4d63840bf761364f1ede5dc9 Author: xubo245 <601450868@...> Date: 2018-01-11T03:25:17Z [SPARK-23036] Add withGlobalTempView for testing and correct some improper with view related method usage Add withGlobalTempView when create global temp view, like withTempView and withView. And correct some improper usage. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20227: [SPARK-23035] Fix warning: TEMPORARY TABLE ... US...
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/20227 [SPARK-23035] Fix warning: TEMPORARY TABLE ... USING ... is deprecated and use TempViewAlreadyExistsException when create temp view Fix warning: TEMPORARY TABLE ... USING ... is deprecated and use TempViewAlreadyExistsException when create temp view There are warning when run test: test("rename temporary view - destination table with database name") Another problem, it throw TempTableAlreadyExistsException and output "Temporary table '$table' already exists" when we create temp view by using org.apache.spark.sql.catalyst.catalog.GlobalTempViewManager#create, it's improper. ## What changes were proposed in this pull request? Fix some warning by changing "TEMPORARY TABLE ... USING ... " to "TEMPORARY VIEW ... USING ... " Fix improper information about TempTableAlreadyExistsException when create temp view ## How was this patch tested? use old test cases, such as " test("create temporary view using") " You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark fixDeprecated Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20227.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20227 ---- commit b97834a58fb2a0a98eb2645bd9e77e97209b Author: xubo245 <601450868@...> Date: 2018-01-11T01:58:48Z [SPARK-23035] Fix warning: TEMPORARY TABLE ... USING ... is deprecated and use TempViewAlreadyExistsException when create temp view Fix warning: TEMPORARY TABLE ... USING ... is deprecated and use TempViewAlreadyExistsException when create temp view There are warning when run test: test("rename temporary view - destination table with database name") Another problem, it throw TempTableAlreadyExistsException and output "Temporary table '$table' already exists" when we create temp view by using org.apache.spark.sql.catalyst.catalog.GlobalTempViewManager#create, it's improper. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20195: [SPARK-22972] Couldn't find corresponding Hive SerDe for...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20195 I submit a separate PR for 2.2 in here, please review it. @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20195: [SPARK-22972] Couldn't find corresponding Hive Se...
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/20195 [SPARK-22972] Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.hive.orc ## What changes were proposed in this pull request? Fix the warning: Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.hive.orc. For branch-2.2, it cherry-pick from https://github.com/apache/spark/commit/8032cf852fccd0ab8754f633affdc9ba8fc99e58 ## How was this patch tested? test("SPARK-22972: hive orc source") You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark HiveSerDeForBranch2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20195.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20195 commit c65f3efd6270adc5c8708e100263379758fd5d82 Author: xubo245 <601450868@...> Date: 2018-01-09T02:15:01Z [SPARK-22972] Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.hive.orc Fix the warning: Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.hive.orc. For branch-2.2, it cherry-pick from https://github.com/apache/spark/commit/8032cf852fccd0ab8754f633affdc9ba8fc99e58 test("SPARK-22972: hive orc source") assert(HiveSerDe.sourceToSerDe("org.apache.spark.sql.hive.orc") .equals(HiveSerDe.sourceToSerDe("orc"))) Author: xubo245 <601450...@qq.com> Closes #20165 from xubo245/HiveSerDe. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20165: [SPARK-22972] Couldn't find corresponding Hive SerDe for...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20165 Thank you too. Ok, I will raise a PR for this later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20165#discussion_r160135992 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -62,6 +64,33 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton { """.stripMargin) } + test("SPARK-22972: hive orc source") { +val tableName = "normal_orc_as_source_hive" +withTable(tableName) { + --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20165#discussion_r160136011 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -62,6 +64,33 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton { """.stripMargin) } + test("SPARK-22972: hive orc source") { +val tableName = "normal_orc_as_source_hive" +withTable(tableName) { + + sql( +s"""CREATE TABLE $tableName + |USING org.apache.spark.sql.hive.orc + |OPTIONS ( + | PATH '${new File(orcTableAsDir.getAbsolutePath).toURI}' + |) + """.stripMargin) --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20165#discussion_r160067418 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -62,6 +63,22 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton { """.stripMargin) } + test("SPARK-22972: hive orc source") { +spark.sql( --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20165#discussion_r160067425 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -62,6 +63,22 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton { """.stripMargin) } + test("SPARK-22972: hive orc source") { +spark.sql( + s"""CREATE TABLE normal_orc_as_source_hive --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20165#discussion_r160067384 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -62,6 +63,22 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton { """.stripMargin) } + test("SPARK-22972: hive orc source") { +spark.sql( + s"""CREATE TABLE normal_orc_as_source_hive + |USING org.apache.spark.sql.hive.orc + |OPTIONS ( + | PATH '${new File(orcTableAsDir.getAbsolutePath).toURI}' + |) + """.stripMargin) +spark.sql("desc formatted normal_orc_as_source_hive").show() --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20165#discussion_r159879314 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -62,6 +63,22 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton { """.stripMargin) } + test("SPARK-22972: hive orc source") { +spark.sql( + s"""CREATE TABLE normal_orc_as_source_hive + |USING org.apache.spark.sql.hive.orc + |OPTIONS ( + | PATH '${new File(orcTableAsDir.getAbsolutePath).toURI}' + |) + """. +stripMargin) +spark.sql( + "desc formatted normal_orc_as_source_hive").show() +checkAnswer(sql("SELECT COUNT(*) FROM normal_orc_as_source_hive"), Row(10)) +assert(HiveSerDe.sourceToSerDe("org.apache.spark.sql.hive.orc") + .equals(HiveSerDe.sourceToSerDe("orc"))) --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20165#discussion_r159879115 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -62,6 +63,22 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton { """.stripMargin) } + test("SPARK-22972: hive orc source") { +spark.sql( + s"""CREATE TABLE normal_orc_as_source_hive + |USING org.apache.spark.sql.hive.orc + |OPTIONS ( + | PATH '${new File(orcTableAsDir.getAbsolutePath).toURI}' + |) + """. +stripMargin) +spark.sql( + "desc formatted normal_orc_as_source_hive").show() --- End diff -- change it to spark.sql("desc formatted normal_orc_as_source_hive").show(), is it ok? How to get the warning and verify it in code? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20165#discussion_r159878689 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -62,6 +63,22 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton { """.stripMargin) } + test("SPARK-22972: hive orc source") { +spark.sql( + s"""CREATE TABLE normal_orc_as_source_hive + |USING org.apache.spark.sql.hive.orc + |OPTIONS ( + | PATH '${new File(orcTableAsDir.getAbsolutePath).toURI}' + |) + """. +stripMargin) --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/20165#discussion_r159878683 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/HiveSerDe.scala --- @@ -72,7 +72,7 @@ object HiveSerDe { def sourceToSerDe(source: String): Option[HiveSerDe] = { val key = source.toLowerCase(Locale.ROOT) match { case s if s.startsWith("org.apache.spark.sql.parquet") => "parquet" - case s if s.startsWith("org.apache.spark.sql.orc") => "orc" --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/20165 [SPARK-22972] Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.hive.orc ## What changes were proposed in this pull request? Fix the warning: Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.hive.orc. (Please fill in changes proposed in this fix) ## How was this patch tested? test("SPARK-22972: hive orc source") assert(HiveSerDe.sourceToSerDe("org.apache.spark.sql.hive.orc") .equals(HiveSerDe.sourceToSerDe("orc"))) (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark HiveSerDe Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20165.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20165 commit fa902d6d3fb635236ac01ee5b43470359f16cfdd Author: xubo245 <601450868@...> Date: 2018-01-05T13:20:53Z [SPARK-22972] Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.hive.orc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20044: [SPARK-22857] Optimize code by inspecting code
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20044 @srowen I find all related array size warning, ![4](https://user-images.githubusercontent.com/8759816/34340165-ccdb3c7a-e9b9-11e7-827f-484283ce97f1.PNG) are there any other issues? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20044: [SPARK-22857] Optimize code by inspecting code
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20044 Ok, I remove some changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20044: [SPARK-22857] Optimize code by inspecting code
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20044 ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20044: [SPARK-22857] Optimize code by inspecting code
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/20044 [SPARK-22857] Optimize code by inspecting code Optimize code by inspecting code, including: remove some unused import change invoking array size method to length method use head method ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark spark2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20044.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20044 commit aab41182a333b6f6b3e58624f896dfa668d75842 Author: xubo245 <601450868@...> Date: 2017-11-27T12:26:58Z [SPARK-22857] Optimize code by inspecting code Optimize code by inspecting code, including: remove some unused import change invoking array size method to length method use head method --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19639: [SPARK-22423][SQL] Scala test source files like TestHive...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/19639 I have fixed all four instances and update the PR title. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19639: [SPARK-22423][SQL] The TestHiveSingleton.scala fi...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/19639#discussion_r148764458 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHiveSingleton.scala --- @@ -24,7 +24,6 @@ import org.apache.spark.sql.SparkSession import org.apache.spark.sql.hive.HiveExternalCatalog import org.apache.spark.sql.hive.client.HiveClient - --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19639: [SPARK-22423][SQL] The TestHiveSingleton.scala file shou...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/19639 Ok, I will fix all and update the PR title Later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19639: [SPARK-22423][SQL] The TestHiveSingleton.scala fi...
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/19639 [SPARK-22423][SQL] The TestHiveSingleton.scala file should be in scala directory ## What changes were proposed in this pull request? The TestHiveSingleton.scala file is moved into scala directory from java directory ## How was this patch tested? Base test class for Hive. No new test case in this PR. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark scalaDirectory Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19639.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19639 commit bfda9345c618a9656ccbe7b472e9e1963d325b45 Author: xubo245 <601450...@qq.com> Date: 2017-11-02T07:30:52Z [SPARK-22423][SQL] The TestHiveSingleton.scala file should be in scala directory --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14422: Add rand(numRows: Int, numCols: Int) functions
Github user xubo245 closed the pull request at: https://github.com/apache/spark/pull/14422 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14422: Add rand(numRows: Int, numCols: Int) functions
GitHub user xubo245 reopened a pull request: https://github.com/apache/spark/pull/14422 Add rand(numRows: Int, numCols: Int) functions ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) add rand(numRows: Int, numCols: Int) functions to DenseMatrix object,like breeze.linalg.DenseMatrix.rand() You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14422.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14422 commit a7a1261f52112a3bca375dd0bed1c1bc0a2e0ed8 Author: å¾æ³¢ <601450...@qq.com> Date: 2016-07-30T15:43:36Z Add rand(numRows: Int, numCols: Int) functions add rand(numRows: Int, numCols: Int) functions to DenseMatrix object,like breeze.linalg.DenseMatrix.rand() commit 054b70ccce73c02cce04caf9f7958cfc555df829 Author: å¾æ³¢ <601450...@qq.com> Date: 2016-07-30T16:36:30Z fix RNG fix RNG , his makes a new RNG for All element --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14422: Add rand(numRows: Int, numCols: Int) functions
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/14422 ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14422: Add rand(numRows: Int, numCols: Int) functions
Github user xubo245 closed the pull request at: https://github.com/apache/spark/pull/14422 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14422: Add rand(numRows: Int, numCols: Int) functions
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/14422 @srowen sorry, please close the issue. I will learning more before next PR. The PR is only because breeze have the function. In spark ,there is no use for them. Could you tell me some issue for starter? Please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14422: Add rand(numRows: Int, numCols: Int) functions
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/14422 @HyukjinKwon Thank you. This is my first time to push request to spark, Sorrry, I will follow the https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14424: Add test:DenseMatrix.rand with no rng
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/14424 It add for https://github.com/apache/spark/pull/14422 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14422: Add rand(numRows: Int, numCols: Int) functions
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/14422 I add test : https://github.com/apache/spark/pull/14424 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14424: Add test:DenseMatrix.rand with no rng
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/14424 Add test:DenseMatrix.rand with no rng ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Add test:DenseMatrix.rand with no rng You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark patch-3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14424.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14424 commit 2e945479af45fdadd8abb4529173db04226de64e Author: å¾æ³¢ <601450...@qq.com> Date: 2016-07-30T18:00:01Z Add test:DenseMatrix.rand with no rng Add test:DenseMatrix.rand with no rng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14422: Add rand(numRows: Int, numCols: Int) functions
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/14422#discussion_r72890434 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -497,6 +497,20 @@ object DenseMatrix { } /** +* Generate a `DenseMatrix` consisting of `i.i.d.` uniform random numbers. +* +* @param numRows number of rows of the matrix +* @param numCols number of columns of the matrix +* @return DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) +*/ + @Since("2.0.0") + def rand(numRows: Int, numCols: Int): DenseMatrix = { +require(numRows.toLong * numCols <= Int.MaxValue, + s"$numRows x $numCols dense matrix is too large to allocate") +new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)((new Random).nextDouble())) --- End diff -- Can fix RNG, This makes a new RNG for all element : val rng = new Random() new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14423: Add zeros(size: Int) function
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/14423 Add zeros(size: Int) function ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Generate a `DenseVector` consisting of zeros. It can replace breeze.linalg.DenseVector#zeros[Double] You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark patch-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14423.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14423 commit b111cdb9236c87af18e1ea773cea72b73fc68561 Author: å¾æ³¢ <601450...@qq.com> Date: 2016-07-30T16:30:47Z Add zeros(size: Int) function Generate a `DenseVector` consisting of zeros. It can replace breeze.linalg.DenseVector#zeros[Double] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14422: Add rand(numRows: Int, numCols: Int) functions
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/14422 we can use it to replacebreeze.linalg.DenseMatrix.rand(numRows: Int, numCols: Int) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14422: Add rand(numRows: Int, numCols: Int) functions
GitHub user xubo245 opened a pull request: https://github.com/apache/spark/pull/14422 Add rand(numRows: Int, numCols: Int) functions ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) add rand(numRows: Int, numCols: Int) functions to DenseMatrix object,like breeze.linalg.DenseMatrix.rand() You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14422.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14422 commit a7a1261f52112a3bca375dd0bed1c1bc0a2e0ed8 Author: å¾æ³¢ <601450...@qq.com> Date: 2016-07-30T15:43:36Z Add rand(numRows: Int, numCols: Int) functions add rand(numRows: Int, numCols: Int) functions to DenseMatrix object,like breeze.linalg.DenseMatrix.rand() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org