from:"xubo245"

[GitHub] spark pull request #20260: [SPARK-23039][SQL] Finish TODO work in alter tabl...

2018-06-12 Thread xubo245

Github user xubo245 closed the pull request at:

https://github.com/apache/spark/pull/20260


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20260: [SPARK-23039][SQL] Finish TODO work in alter tabl...

2018-04-10 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20260#discussion_r180359425
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -787,7 +787,7 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 val storageWithLocation = {
   val tableLocation = getLocationFromStorageProps(table)
   // We pass None as `newPath` here, to remove the path option in 
storage properties.
-  updateLocationInStorageProps(table, newPath = None).copy(
+  table.storage.copy(
--- End diff --

So we should remove the TODO comment/work?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20260: [SPARK-23039][SQL] Finish TODO work in alter table set l...

2018-04-06 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20260
  
This PR is different


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-04-06 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20249
  
Excatly,  set location works its just not updating the path in the storage 
properties when change the path of partition i
 when change the path of partition in table.  

This PR just want to finish this TODO(gatorsmile) and add partition path to 
properties .



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-04-03 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20249
  
It's belong to TODO work @tgravescs 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-03-13 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20249
  
rebase


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-03-13 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20249
  
@gatorsmile @attilapiros Could you help to review? How to handle this PR?  
It's long time after being created.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20583: [SPARK-23392][TEST] Add some test cases for image...

2018-02-12 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20583#discussion_r167740107
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala ---
@@ -65,11 +65,71 @@ class ImageSchemaSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 assert(count50 > 0 && count50 < countTotal)
   }
 
+  test("readImages test: recursive = false") {
+val df = readImages(imagePath, null, false, 3, true, 1.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: read jpg image") {
+val df = readImages(imagePath + "/kittens/DP153539.jpg", null, false, 
3, true, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: read png image") {
+val df = readImages(imagePath + "/multi-channel/BGRA.png", null, 
false, 3, true, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: read non image") {
+val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 
3, true, 1.0, 0)
+assert(df.schema("image").dataType == columnSchema, "data do not fit 
ImageSchema")
+assert(df.count() === 0)
+  }
+
+  test("readImages test: read non image and dropImageFailures is false") {
+val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 
3, false, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: sampleRatio > 1") {
+val e = intercept[IllegalArgumentException] {
+  readImages(imagePath, null, true, 3, true, 1.1, 0)
+}
+assert(e.getMessage.equals("requirement failed: sampleRatio should be 
between 0 and 1"))
+  }
+
+  test("readImages test: sampleRatio < 0") {
+val e = intercept[IllegalArgumentException] {
+  readImages(imagePath, null, true, 3, true, -0.1, 0)
+}
+assert(e.getMessage.equals("requirement failed: sampleRatio should be 
between 0 and 1"))
--- End diff --

Thanks, done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20583: [SPARK-23392][TEST] Add some test cases for image...

2018-02-12 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20583#discussion_r167545871
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala ---
@@ -65,11 +65,71 @@ class ImageSchemaSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 assert(count50 > 0 && count50 < countTotal)
   }
 
+  test("readImages test: recursive = false") {
+val df = readImages(imagePath, null, false, 3, true, 1.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: read jpg image") {
+val df = readImages(imagePath + "/kittens/DP153539.jpg", null, false, 
3, true, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: read png image") {
+val df = readImages(imagePath + "/multi-channel/BGRA.png", null, 
false, 3, true, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: read non image") {
+val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 
3, true, 1.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: read non image and dropImageFailures is false") {
+val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 
3, false, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: sampleRatio > 1") {
+val e = intercept[IllegalArgumentException] {
+  readImages(imagePath, null, true, 3, true, 1.1, 0)
+}
+assert(e.getMessage.equals("requirement failed: sampleRatio should be 
between 0 and 1"))
+  }
+
+  test("readImages test: sampleRatio < 0") {
+val e = intercept[IllegalArgumentException] {
+  readImages(imagePath, null, true, 3, true, -0.1, 0)
+}
+assert(e.getMessage.equals("requirement failed: sampleRatio should be 
between 0 and 1"))
+  }
+
+  test("readImages test: sampleRatio = 0") {
+val df = readImages(imagePath, null, true, 3, true, 0.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: with sparkSession") {
+val df = readImages(imagePath, sparkSession = spark, true, 3, true, 
1.0, 0)
--- End diff --

Can you check it?  This PR not merge into branch2.3: 
https://github.com/apache/spark/pull/20389

I fetch the code of branch 2.3 before.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20583: [SPARK-23392][TEST] Add some test cases for image...

2018-02-11 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20583#discussion_r167481154
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala ---
@@ -65,11 +65,71 @@ class ImageSchemaSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 assert(count50 > 0 && count50 < countTotal)
   }
 
+  test("readImages test: recursive = false") {
+val df = readImages(imagePath, null, false, 3, true, 1.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: read jpg image") {
+val df = readImages(imagePath + "/kittens/DP153539.jpg", null, false, 
3, true, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: read png image") {
+val df = readImages(imagePath + "/multi-channel/BGRA.png", null, 
false, 3, true, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: read non image") {
+val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 
3, true, 1.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: read non image and dropImageFailures is false") {
+val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 
3, false, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: sampleRatio > 1") {
+val e = intercept[IllegalArgumentException] {
+  readImages(imagePath, null, true, 3, true, 1.1, 0)
+}
+assert(e.getMessage.equals("requirement failed: sampleRatio should be 
between 0 and 1"))
+  }
+
+  test("readImages test: sampleRatio < 0") {
+val e = intercept[IllegalArgumentException] {
+  readImages(imagePath, null, true, 3, true, -0.1, 0)
+}
+assert(e.getMessage.equals("requirement failed: sampleRatio should be 
between 0 and 1"))
+  }
+
+  test("readImages test: sampleRatio = 0") {
+val df = readImages(imagePath, null, true, 3, true, 0.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: with sparkSession") {
+val df = readImages(imagePath, sparkSession = spark, true, 3, true, 
1.0, 0)
--- End diff --

It should be. I try it now. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20583: [SPARK-23392][TEST] Add some test cases for image...

2018-02-11 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20583#discussion_r167479724
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala ---
@@ -65,11 +65,71 @@ class ImageSchemaSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 assert(count50 > 0 && count50 < countTotal)
   }
 
+  test("readImages test: recursive = false") {
+val df = readImages(imagePath, null, false, 3, true, 1.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: read jpg image") {
+val df = readImages(imagePath + "/kittens/DP153539.jpg", null, false, 
3, true, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: read png image") {
+val df = readImages(imagePath + "/multi-channel/BGRA.png", null, 
false, 3, true, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: read non image") {
+val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 
3, true, 1.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: read non image and dropImageFailures is false") {
+val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 
3, false, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: sampleRatio > 1") {
+val e = intercept[IllegalArgumentException] {
+  readImages(imagePath, null, true, 3, true, 1.1, 0)
+}
+assert(e.getMessage.equals("requirement failed: sampleRatio should be 
between 0 and 1"))
+  }
+
+  test("readImages test: sampleRatio < 0") {
+val e = intercept[IllegalArgumentException] {
+  readImages(imagePath, null, true, 3, true, -0.1, 0)
+}
+assert(e.getMessage.equals("requirement failed: sampleRatio should be 
between 0 and 1"))
+  }
+
+  test("readImages test: sampleRatio = 0") {
+val df = readImages(imagePath, null, true, 3, true, 0.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: with sparkSession") {
+val df = readImages(imagePath, sparkSession = spark, true, 3, true, 
1.0, 0)
+assert(df.count() === 7)
+  }
+
   test("readImages partition test") {
 val df = readImages(imagePath, null, true, 3, true, 1.0, 0)
 assert(df.rdd.getNumPartitions === 3)
   }
 
+  test("readImages partition test: < 0") {
+val df = readImages(imagePath, null, true, -3, true, 1.0, 0)
+assert(df.rdd.getNumPartitions === 
spark.sparkContext.defaultParallelism)
+  }
+
+  test("readImages partition test: = 0") {
+val df = readImages(imagePath, null, true, 0, true, 1.0, 0)
+assert(df.rdd.getNumPartitions != 0)
--- End diff --

Ok, done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20583: [SPARK-23392][TEST] Add some test cases for image...

2018-02-11 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20583#discussion_r167479658
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala ---
@@ -65,11 +65,71 @@ class ImageSchemaSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 assert(count50 > 0 && count50 < countTotal)
   }
 
+  test("readImages test: recursive = false") {
+val df = readImages(imagePath, null, false, 3, true, 1.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: read jpg image") {
+val df = readImages(imagePath + "/kittens/DP153539.jpg", null, false, 
3, true, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: read png image") {
+val df = readImages(imagePath + "/multi-channel/BGRA.png", null, 
false, 3, true, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: read non image") {
+val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 
3, true, 1.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: read non image and dropImageFailures is false") {
+val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 
3, false, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: sampleRatio > 1") {
+val e = intercept[IllegalArgumentException] {
+  readImages(imagePath, null, true, 3, true, 1.1, 0)
+}
+assert(e.getMessage.equals("requirement failed: sampleRatio should be 
between 0 and 1"))
+  }
+
+  test("readImages test: sampleRatio < 0") {
+val e = intercept[IllegalArgumentException] {
+  readImages(imagePath, null, true, 3, true, -0.1, 0)
+}
+assert(e.getMessage.equals("requirement failed: sampleRatio should be 
between 0 and 1"))
+  }
+
+  test("readImages test: sampleRatio = 0") {
+val df = readImages(imagePath, null, true, 3, true, 0.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: with sparkSession") {
+val df = readImages(imagePath, sparkSession = spark, true, 3, true, 
1.0, 0)
+assert(df.count() === 7)
+  }
+
   test("readImages partition test") {
 val df = readImages(imagePath, null, true, 3, true, 1.0, 0)
 assert(df.rdd.getNumPartitions === 3)
   }
 
+  test("readImages partition test: < 0") {
+val df = readImages(imagePath, null, true, -3, true, 1.0, 0)
+assert(df.rdd.getNumPartitions === 
spark.sparkContext.defaultParallelism)
+  }
+
+  test("readImages partition test: = 0") {
--- End diff --

test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20583: [SPARK-23392][TEST] Add some test cases for image...

2018-02-11 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20583#discussion_r167479566
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala ---
@@ -65,11 +65,71 @@ class ImageSchemaSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 assert(count50 > 0 && count50 < countTotal)
   }
 
+  test("readImages test: recursive = false") {
+val df = readImages(imagePath, null, false, 3, true, 1.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: read jpg image") {
+val df = readImages(imagePath + "/kittens/DP153539.jpg", null, false, 
3, true, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: read png image") {
+val df = readImages(imagePath + "/multi-channel/BGRA.png", null, 
false, 3, true, 1.0, 0)
+assert(df.count() === 1)
+  }
+
+  test("readImages test: read non image") {
+val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 
3, true, 1.0, 0)
+assert(df.count() === 0)
+  }
+
+  test("readImages test: read non image and dropImageFailures is false") {
+val df = readImages(imagePath + "/kittens/not-image.txt", null, false, 
3, false, 1.0, 0)
+assert(df.count() === 1)
--- End diff --

Ok, done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20583: [SPARK-23392][TEST] Add some test cases for images featu...

2018-02-11 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20583
  
Sorry, done. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20583: [CARBONDATA-23392][TEST] Add some test cases for ...

2018-02-11 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/20583

[CARBONDATA-23392][TEST] Add some test cases for images feature

## What changes were proposed in this pull request?

Add some test cases for images feature

## How was this patch tested?
Add some test cases in ImageSchemaSuite


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark CARBONDATA23392_AddTestForImage

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20583.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20583


commit bc375112274d862de536841208d6e7cda151afe2
Author: xubo245 <601450868@...>
Date:   2018-02-12T03:28:41Z

[CARBONDATA-23392][TEST] Add some test case for images feature




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...

2018-02-05 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20456#discussion_r166162010
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -667,6 +667,51 @@ def repartition(self, numPartitions, *cols):
 else:
 raise TypeError("numPartitions should be an int or Column")
 
+@since("2.3.0")
--- End diff --

ok, rename to 2.4.0, done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...

2018-02-04 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20456#discussion_r165838751
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -667,6 +667,51 @@ def repartition(self, numPartitions, *cols):
 else:
 raise TypeError("numPartitions should be an int or Column")
 
+@since("2.3.0")
+def repartitionByRange(self, numPartitions, *cols):
+"""
+Returns a new :class:`DataFrame` partitioned by the given 
partitioning expressions. The
+resulting DataFrame is range partitioned.
+
+``numPartitions`` can be an int to specify the target number of 
partitions or a Column.
+If it is a Column, it will be used as the first partitioning 
column. If not specified,
+the default number of partitions is used.
+
+At least one partition-by expression must be specified.
+When no explicit sort order is specified, "ascending nulls first" 
is assumed.
+
+>>> df.repartitionByRange(2, "age").rdd.getNumPartitions()
+2
+>>> df.show()
++---+-+
+|age| name|
++---+-+
+|  2|Alice|
+|  5|  Bob|
++---+-+
+>>> df.repartitionByRange(1, "age").rdd.getNumPartitions()
+1
+>>> data = df.repartitionByRange("age")
+>>> df.show()
++---+-+
+|age| name|
++---+-+
+|  2|Alice|
+|  5|  Bob|
++---+-+
+"""
+if isinstance(numPartitions, int):
+if len(cols) == 0:
+return ValueError("At least one partition-by expression 
must be specified.")
+else:
+return DataFrame(
+self._jdf.repartitionByRange(numPartitions, 
self._jcols(*cols)), self.sql_ctx)
+elif isinstance(numPartitions, (basestring, Column)):
+cols = (numPartitions,) + cols
+return 
DataFrame(self._jdf.repartitionByRange(self._jcols(*cols)), self.sql_ctx)
+else:
+raise TypeError("numPartitions should be an int or Column")
--- End diff --

ok, done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...

2018-02-04 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20456#discussion_r165836612
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -667,6 +667,92 @@ def repartition(self, numPartitions, *cols):
 else:
 raise TypeError("numPartitions should be an int or Column")
 
+@since("2.3.0")
+def repartitionByRange(self, numPartitions, *cols):
+"""
+Returns a new :class:`DataFrame` partitioned by the given 
partitioning expressions. The
+resulting DataFrame is range partitioned.
+
+``numPartitions`` can be an int to specify the target number of 
partitions or a Column.
+If it is a Column, it will be used as the first partitioning 
column. If not specified,
+the default number of partitions is used.
+
+At least one partition-by expression must be specified.
+When no explicit sort order is specified, "ascending nulls first" 
is assumed.
+
+>>> df.repartitionByRange(2, "age").rdd.getNumPartitions()
+2
+>>> df.show()
++---+-+
+|age| name|
++---+-+
+|  2|Alice|
+|  5|  Bob|
++---+-+
+>>> df.repartitionByRange(1, "age").rdd.getNumPartitions()
+1
+>>> data = df.union(df)
+>>> data.show()
++---+-+
+|age| name|
++---+-+
+|  2|Alice|
+|  5|  Bob|
+|  2|Alice|
+|  5|  Bob|
++---+-+
+>>> data = data.repartitionByRange(3, "age")
+>>> data.show()
++---+-+
+|age| name|
++---+-+
+|  2|Alice|
+|  2|Alice|
+|  5|  Bob|
+|  5|  Bob|
++---+-+
+>>> data.rdd.getNumPartitions()
+3
+>>> data = data.repartitionByRange("age")
+>>> data.rdd.getNumPartitions()
+3
+>>> data2 = df.union(df).union(df)
--- End diff --

ok, remove union


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...

2018-02-03 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20456#discussion_r165814225
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -667,6 +667,92 @@ def repartition(self, numPartitions, *cols):
 else:
 raise TypeError("numPartitions should be an int or Column")
 
+@since("2.3.0")
+def repartitionByRange(self, numPartitions, *cols):
+"""
+Returns a new :class:`DataFrame` partitioned by the given 
partitioning expressions. The
+resulting DataFrame is range partitioned.
+
+``numPartitions`` can be an int to specify the target number of 
partitions or a Column.
+If it is a Column, it will be used as the first partitioning 
column. If not specified,
+the default number of partitions is used.
+
+At least one partition-by expression must be specified.
+When no explicit sort order is specified, "ascending nulls first" 
is assumed.
+
+>>> df.repartitionByRange(2, "age").rdd.getNumPartitions()
+2
+>>> df.show()
++---+-+
+|age| name|
++---+-+
+|  2|Alice|
+|  5|  Bob|
++---+-+
+>>> df.repartitionByRange(1, "age").rdd.getNumPartitions()
+1
+>>> data = df.union(df)
+>>> data.show()
++---+-+
+|age| name|
++---+-+
+|  2|Alice|
+|  5|  Bob|
+|  2|Alice|
+|  5|  Bob|
++---+-+
+>>> data = data.repartitionByRange(3, "age")
+>>> data.show()
++---+-+
+|age| name|
++---+-+
+|  2|Alice|
+|  2|Alice|
+|  5|  Bob|
+|  5|  Bob|
++---+-+
+>>> data.rdd.getNumPartitions()
+3
+>>> data = data.repartitionByRange("age")
+>>> data.rdd.getNumPartitions()
+3
+>>> data2 = df.union(df).union(df)
--- End diff --

How to test data after repartitionByRange("age")


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...

2018-02-03 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20456#discussion_r165814229
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -667,6 +667,92 @@ def repartition(self, numPartitions, *cols):
 else:
 raise TypeError("numPartitions should be an int or Column")
 
+@since("2.3.0")
+def repartitionByRange(self, numPartitions, *cols):
+"""
+Returns a new :class:`DataFrame` partitioned by the given 
partitioning expressions. The
+resulting DataFrame is range partitioned.
+
+``numPartitions`` can be an int to specify the target number of 
partitions or a Column.
+If it is a Column, it will be used as the first partitioning 
column. If not specified,
+the default number of partitions is used.
+
+At least one partition-by expression must be specified.
+When no explicit sort order is specified, "ascending nulls first" 
is assumed.
+
+>>> df.repartitionByRange(2, "age").rdd.getNumPartitions()
+2
+>>> df.show()
++---+-+
+|age| name|
++---+-+
+|  2|Alice|
+|  5|  Bob|
++---+-+
+>>> df.repartitionByRange(1, "age").rdd.getNumPartitions()
+1
+>>> data = df.union(df)
--- End diff --

How to test data after repartitionByRange("age")?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20456: [SPARK-22624][PYSPARK] Expose range partitioning shuffle...

2018-02-02 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20456
  
@gatorsmile @HyukjinKwon Please review it again, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-02-02 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20249
  
@gatorsmile @dongjoon-hyun please review it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...

2018-02-02 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20456#discussion_r165643385
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -667,6 +667,55 @@ def repartition(self, numPartitions, *cols):
 else:
 raise TypeError("numPartitions should be an int or Column")
 
+@since("2.3.0")
+def repartitionByRange(self, numPartitions, *cols, **kwargs):
+"""
+Returns a new :class:`DataFrame` partitioned by the given 
partitioning expressions. The
+resulting DataFrame is range partitioned.
+
+``numPartitions`` can be an int to specify the target number of 
partitions or a Column.
+If it is a Column, it will be used as the first partitioning 
column. If not specified,
+the default number of partitions is used.
--- End diff --

ok, done, please review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...

2018-02-02 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20456#discussion_r165643349
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -667,6 +667,55 @@ def repartition(self, numPartitions, *cols):
 else:
 raise TypeError("numPartitions should be an int or Column")
 
+@since("2.3.0")
+def repartitionByRange(self, numPartitions, *cols, **kwargs):
+"""
+Returns a new :class:`DataFrame` partitioned by the given 
partitioning expressions. The
+resulting DataFrame is range partitioned.
+
+``numPartitions`` can be an int to specify the target number of 
partitions or a Column.
+If it is a Column, it will be used as the first partitioning 
column. If not specified,
+the default number of partitions is used.
+
+At least one partition-by expression must be specified.
+When no explicit sort order is specified, "ascending nulls first" 
is assumed.
+
+>>> df.repartitionByRange(2, "age").rdd.getNumPartitions()
+2
+>>> data = df.union(df).repartition(1, "age")
+>>> data.rdd.getNumPartitions()
+1
+>>> data.show()
++---+-+
+|age| name|
++---+-+
+|  2|Alice|
+|  5|  Bob|
+|  2|Alice|
+|  5|  Bob|
++---+-+
+>>> data = data.repartitionByRange(3, "age")
+>>> data.show()
++---+-+
+|age| name|
++---+-+
+|  2|Alice|
+|  2|Alice|
+|  5|  Bob|
+|  5|  Bob|
++---+-+
+>>> data.rdd.getNumPartitions()
+3
+"""
+if isinstance(numPartitions, int):
+if len(cols) == 0:
+return ValueError("At least one partition-by expression 
must be specified.")
+else:
+return DataFrame(
+self._jdf.repartitionByRange(numPartitions, 
self._jcols(*cols)), self.sql_ctx)
+else:
--- End diff --

ok,done, please review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...

2018-02-02 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20456#discussion_r165638989
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -667,6 +667,55 @@ def repartition(self, numPartitions, *cols):
 else:
 raise TypeError("numPartitions should be an int or Column")
 
+@since("2.3.0")
+def repartitionByRange(self, numPartitions, *cols, **kwargs):
+"""
+Returns a new :class:`DataFrame` partitioned by the given 
partitioning expressions. The
+resulting DataFrame is range partitioned.
+
+``numPartitions`` can be an int to specify the target number of 
partitions or a Column.
+If it is a Column, it will be used as the first partitioning 
column. If not specified,
+the default number of partitions is used.
+
+At least one partition-by expression must be specified.
+When no explicit sort order is specified, "ascending nulls first" 
is assumed.
+
+>>> df.repartitionByRange(2, "age").rdd.getNumPartitions()
+2
+>>> data = df.union(df).repartition(1, "age")
+>>> data.rdd.getNumPartitions()
+1
+>>> data.show()
++---+-+
+|age| name|
++---+-+
+|  2|Alice|
+|  5|  Bob|
+|  2|Alice|
+|  5|  Bob|
++---+-+
+>>> data = data.repartitionByRange(3, "age")
+>>> data.show()
++---+-+
+|age| name|
++---+-+
+|  2|Alice|
+|  2|Alice|
+|  5|  Bob|
+|  5|  Bob|
++---+-+
+>>> data.rdd.getNumPartitions()
+3
+"""
+if isinstance(numPartitions, int):
+if len(cols) == 0:
+return ValueError("At least one partition-by expression 
must be specified.")
+else:
+return DataFrame(
+self._jdf.repartitionByRange(numPartitions, 
self._jcols(*cols)), self.sql_ctx)
+else:
--- End diff --

throw  typeErrorï¼ how to handle in test case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...

2018-02-02 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20456#discussion_r165627387
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -667,6 +667,55 @@ def repartition(self, numPartitions, *cols):
 else:
 raise TypeError("numPartitions should be an int or Column")
 
+@since("2.3.0")
+def repartitionByRange(self, numPartitions, *cols, **kwargs):
--- End diff --

sorryï¼ unused,  remove it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...

2018-02-02 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20456#discussion_r165627994
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -667,6 +667,55 @@ def repartition(self, numPartitions, *cols):
 else:
 raise TypeError("numPartitions should be an int or Column")
 
+@since("2.3.0")
+def repartitionByRange(self, numPartitions, *cols, **kwargs):
+"""
+Returns a new :class:`DataFrame` partitioned by the given 
partitioning expressions. The
+resulting DataFrame is range partitioned.
+
+``numPartitions`` can be an int to specify the target number of 
partitions or a Column.
+If it is a Column, it will be used as the first partitioning 
column. If not specified,
+the default number of partitions is used.
+
+At least one partition-by expression must be specified.
+When no explicit sort order is specified, "ascending nulls first" 
is assumed.
+
+>>> df.repartitionByRange(2, "age").rdd.getNumPartitions()
+2
+>>> data = df.union(df).repartition(1, "age")
--- End diff --

ok, change it to repartitionByRange


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20456: [SPARK-22624][PYSPARK] Expose range partitioning shuffle...

2018-01-31 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20456
  
@gatorsmile  please review it. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20456: [SPARK-22624][PYSPARK] Expose range partitioning ...

2018-01-31 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/20456

[SPARK-22624][PYSPARK] Expose range partitioning shuffle introduced by 
spark-22614


## What changes were proposed in this pull request?

 Expose range partitioning shuffle introduced by spark-22614

## How was this patch tested?

Unit test in dataframe.py

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark SPARK22624_PysparkRangePartition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20456.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20456


commit 8aaeee331df8b57a36238212eecf238e0c093d93
Author: xubo245 <601450868@...>
Date:   2018-01-31T14:29:50Z

[SPARK-22624][PYSPARK] Expose range partitioning shuffle introduced by 
spark-22614




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20250: [SPARK-23059][SQL][TEST] Correct some improper with view...

2018-01-29 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20250
  
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-01-28 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20249
  
@gatorsmile Please review it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20250: [SPARK-23059][SQL][TEST] Correct some improper with view...

2018-01-28 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20250
  
@gatorsmile @jiangxb1987 @dongjoon-hyun Can this PR be merged ? I will fix 
if it has problem


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20260: [SPARK-23039][SQL] Finish TODO work in alter table set l...

2018-01-18 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20260
  
I will fix the error of this PR after 
https://github.com/apache/spark/pull/20249#issuecomment-358720962 merged


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-01-18 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20249
  
@gatorsmile Tests pass, please review it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20227: [SPARK-23035][SQL] Fix improper information of Te...

2018-01-17 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20227#discussion_r161981394
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala
 ---
@@ -31,7 +31,7 @@ class TableAlreadyExistsException(db: String, table: 
String)
   extends AnalysisException(s"Table or view '$table' already exists in 
database '$db'")
 
 class TempTableAlreadyExistsException(table: String)
--- End diff --

I think we should rename it. But @gatorsmile  said "We do not want to 
introduce a new exception type. In contrast, we planned to remove all these 
exception sub-types because PySpark might output a confusing error message.".  
So I revert it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION shou...

2018-01-15 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20249#discussion_r161648924
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -751,6 +751,25 @@ class HiveDDLSuite
 }
   }
 
+  test("SPARK-23057: SET LOCATION should change the path of partition in 
table") {
+withTable("boxes") {
+  sql("CREATE TABLE boxes (height INT, length INT) PARTITIONED BY 
(width INT)")
+  sql("INSERT OVERWRITE TABLE boxes PARTITION (width=4) SELECT 4, 4")
+  val expected = "/path/to/part/ways"
+  sql(s"ALTER TABLE boxes PARTITION (width=4) SET LOCATION 
'$expected'")
+  val catalog = spark.sessionState.catalog
--- End diff --

Good idea, I will try


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION shou...

2018-01-15 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20249#discussion_r161646360
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1869,6 +1869,65 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("SPARK-23057: SET LOCATION for managed table with partition") {
+withTable("tbl_partition") {
+  withTempDir { dir =>
+sql("CREATE TABLE tbl_partition(col1 INT, col2 INT) USING parquet 
PARTITIONED BY (col1)")
+sql("INSERT INTO tbl_partition PARTITION(col1=1) SELECT 11")
+sql("INSERT INTO tbl_partition PARTITION(col1=2) SELECT 22")
+checkAnswer(spark.table("tbl_partition"), Seq(Row(11, 1), Row(22, 
2)))
+val defaultTablePath = spark.sessionState.catalog
+  
.getTableMetadata(TableIdentifier("tbl_partition")).storage.locationUri.get
+try {
+  // before set location of partition col1 =1 and 2
+  checkPath(defaultTablePath.toString, Map("col1" -> "1"), 
"tbl_partition")
+  checkPath(defaultTablePath.toString, Map("col1" -> "2"), 
"tbl_partition")
+  val path = dir.getCanonicalPath
+
+  // set location of partition col1 =1
+  sql(s"ALTER TABLE tbl_partition PARTITION (col1='1') SET 
LOCATION '$path'")
+  checkPath(dir.getCanonicalPath, Map("col1" -> "1"), 
"tbl_partition")
--- End diff --

ok, done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION shou...

2018-01-15 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20249#discussion_r161645520
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -800,6 +802,15 @@ case class AlterTableSetLocationCommand(
 CommandUtils.updateTableStats(sparkSession, table)
 Seq.empty[Row]
   }
+
+  private def updatePathInProps(
+  storage: CatalogStorageFormat,
+  newPath: Option[String]): Map[String, String] = {
--- End diff --

ok, done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION shou...

2018-01-15 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20249#discussion_r161645457
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1869,6 +1869,65 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("SPARK-23057: SET LOCATION for managed table with partition") {
+withTable("tbl_partition") {
+  withTempDir { dir =>
+sql("CREATE TABLE tbl_partition(col1 INT, col2 INT) USING parquet 
PARTITIONED BY (col1)")
+sql("INSERT INTO tbl_partition PARTITION(col1=1) SELECT 11")
+sql("INSERT INTO tbl_partition PARTITION(col1=2) SELECT 22")
+checkAnswer(spark.table("tbl_partition"), Seq(Row(11, 1), Row(22, 
2)))
+val defaultTablePath = spark.sessionState.catalog
+  
.getTableMetadata(TableIdentifier("tbl_partition")).storage.locationUri.get
+try {
+  // before set location of partition col1 =1 and 2
+  checkPath(defaultTablePath.toString, Map("col1" -> "1"), 
"tbl_partition")
+  checkPath(defaultTablePath.toString, Map("col1" -> "2"), 
"tbl_partition")
+  val path = dir.getCanonicalPath
+
+  // set location of partition col1 =1
+  sql(s"ALTER TABLE tbl_partition PARTITION (col1='1') SET 
LOCATION '$path'")
+  checkPath(dir.getCanonicalPath, Map("col1" -> "1"), 
"tbl_partition")
+  checkPath(defaultTablePath.toString, Map("col1" -> "2"), 
"tbl_partition")
+
+  // set location of partition col1 =2
+  sql(s"ALTER TABLE tbl_partition PARTITION (col1='2') SET 
LOCATION '$path'")
+  checkPath(dir.getCanonicalPath, Map("col1" -> "1"), 
"tbl_partition")
+  checkPath(dir.getCanonicalPath, Map("col1" -> "2"), 
"tbl_partition")
+
+  spark.catalog.refreshTable("tbl_partition")
+  // SET LOCATION won't move data from previous table path to new 
table path.
+  assert(spark.table("tbl_partition").count() == 0)
--- End diff --

ok, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION shou...

2018-01-15 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20249#discussion_r161645330
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1869,6 +1869,65 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("SPARK-23057: SET LOCATION for managed table with partition") {
+withTable("tbl_partition") {
+  withTempDir { dir =>
+sql("CREATE TABLE tbl_partition(col1 INT, col2 INT) USING parquet 
PARTITIONED BY (col1)")
+sql("INSERT INTO tbl_partition PARTITION(col1=1) SELECT 11")
+sql("INSERT INTO tbl_partition PARTITION(col1=2) SELECT 22")
+checkAnswer(spark.table("tbl_partition"), Seq(Row(11, 1), Row(22, 
2)))
+val defaultTablePath = spark.sessionState.catalog
+  
.getTableMetadata(TableIdentifier("tbl_partition")).storage.locationUri.get
+try {
+  // before set location of partition col1 =1 and 2
+  checkPath(defaultTablePath.toString, Map("col1" -> "1"), 
"tbl_partition")
+  checkPath(defaultTablePath.toString, Map("col1" -> "2"), 
"tbl_partition")
+  val path = dir.getCanonicalPath
+
+  // set location of partition col1 =1
+  sql(s"ALTER TABLE tbl_partition PARTITION (col1='1') SET 
LOCATION '$path'")
+  checkPath(dir.getCanonicalPath, Map("col1" -> "1"), 
"tbl_partition")
+  checkPath(defaultTablePath.toString, Map("col1" -> "2"), 
"tbl_partition")
+
+  // set location of partition col1 =2
+  sql(s"ALTER TABLE tbl_partition PARTITION (col1='2') SET 
LOCATION '$path'")
+  checkPath(dir.getCanonicalPath, Map("col1" -> "1"), 
"tbl_partition")
+  checkPath(dir.getCanonicalPath, Map("col1" -> "2"), 
"tbl_partition")
+
+  spark.catalog.refreshTable("tbl_partition")
+  // SET LOCATION won't move data from previous table path to new 
table path.
+  assert(spark.table("tbl_partition").count() == 0)
+  // the previous table path should be still there.
+  assert(new File(defaultTablePath).exists())
+
+  sql("INSERT INTO tbl_partition PARTITION(col1=2) SELECT 33")
+  // newly inserted data will go to the new table path.
+  assert(dir.listFiles().nonEmpty)
+
+  sql("DROP TABLE tbl_partition")
+  // the new table path will be removed after DROP TABLE.
+  assert(!dir.exists())
+} finally {
+  Utils.deleteRecursively(new File(defaultTablePath))
+}
+  }
+}
+  }
+
+  def checkPath(path: String, partSpec: Map[String, String], table: 
String): Unit = {
+val catalog = spark.sessionState.catalog
+val spec = Some(partSpec)
--- End diff --

ok, done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20250: [SPARK-23059][SQL][TEST] Correct some improper with view...

2018-01-14 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20250
  
@dongjoon-hyun @gatorsmile Please review it. thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20227: [SPARK-23035][SQL] Fix improper information of TempTable...

2018-01-14 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20227
  
@dongjoon-hyun  ok ,done. Jira webpage shows "Maintenance in progress"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20227: [SPARK-23035][SQL] Fix warning: TEMPORARY TABLE ... USIN...

2018-01-14 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20227
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20227: [SPARK-23035][SQL] Fix warning: TEMPORARY TABLE ....

2018-01-13 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20227#discussion_r161386989
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -883,6 +908,41 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+test("rename temporary view - destination table already exists, with 
sql: CREATE TEMPORARY view") {
--- End diff --

okï¼ done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20260: [SPARK-23039][SQL] Finish TODO work in alter table set l...

2018-01-13 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20260
  
@gatorsmile Please review it. This is your TODO work. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20260: [SPARK-23039][SQL] Finish TODO work in alter tabl...

2018-01-13 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/20260

 [SPARK-23039][SQL] Finish TODO work in alter table set location.


## What changes were proposed in this pull request?
 Finish TODO work in alter table set location.
  org.apache.spark.sql.execution.command.DDLSuite#testSetLocation

// TODO(gatorsmile): fix the bug in alter table set location.
//if (isUsingHiveMetastore) {
//assert(storageFormat.properties.get("path") === expected)
//   }

fix it by remove newPath = None in 
org.apache.spark.sql.hive.HiveExternalCatalog#restoreDataSourceTable
## How was this patch tested?

 test("SPARK-23039: check path after SET LOCATION")

Wait for https://github.com/apache/spark/pull/20249

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark setLocationTODO

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20260.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20260


commit 76c1813cf6e0e0e0d085cd31dcf1633c80829eff
Author: xubo245 <601450868@...>
Date:   2018-01-13T13:53:52Z

 [SPARK-23039][SQL] Fix the bug in alter table set location.

  TOBO work: Fix the bug in alter table set location.
  org.apache.spark.sql.execution.command.DDLSuite#testSetLocation

// TODO(gatorsmile): fix the bug in alter table set location.
//if (isUsingHiveMetastore) {
//assert(storageFormat.properties.get("path") === expected)
//   }




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20227: [SPARK-23035][SQL] Fix warning: TEMPORARY TABLE ....

2018-01-13 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20227#discussion_r161367362
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -814,7 +814,7 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 withTempView("tab1") {
   sql(
 """
-  |CREATE TEMPORARY TABLE tab1
--- End diff --

Ok, done. I keep the old test cases for test coverage, and add new test 
cases for temp view


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20227: [SPARK-23035][SQL] Fix warning: TEMPORARY TABLE ....

2018-01-13 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20227#discussion_r161367322
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala
 ---
@@ -33,6 +33,9 @@ class TableAlreadyExistsException(db: String, table: 
String)
 class TempTableAlreadyExistsException(table: String)
   extends AnalysisException(s"Temporary table '$table' already exists")
 
+class TempViewAlreadyExistsException(table: String)
+  extends AnalysisException(s"Temporary view '$table' already exists")
--- End diff --

ok, I will remove new exception 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20227: [SPARK-23035][SQL] Fix warning: TEMPORARY TABLE ....

2018-01-13 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20227#discussion_r161366875
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala
 ---
@@ -33,6 +33,9 @@ class TableAlreadyExistsException(db: String, table: 
String)
 class TempTableAlreadyExistsException(table: String)
   extends AnalysisException(s"Temporary table '$table' already exists")
--- End diff --

How about the class name?  TempTableAlreadyExistsException 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20228: [SPARK-23036][SQL][TEST] Add withGlobalTempView for test...

2018-01-12 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20228
  
@gatorsmile Sure. This is only for TEST. Done , I put '[SQL]' into title too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20228: [SPARK-23036][SQL][TEST] Add withGlobalTempView for test...

2018-01-12 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20228
  
ok, done, I  put '[SQL]' into title.  @dongjoon-hyun 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20250: [SPARK-23059][SQL][TEST] Correct some improper with view...

2018-01-12 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20250
  
@dongjoon-hyun Please review it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20228: [SPARK-23036][SQL][TEST] Add withGlobalTempView f...

2018-01-12 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20228#discussion_r161253370
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/GlobalTempViewSuite.scala
 ---
@@ -140,8 +140,8 @@ class GlobalTempViewSuite extends QueryTest with 
SharedSQLContext {
 
   
assert(spark.catalog.listTables(globalTempDB).collect().toSeq.map(_.name) == 
Seq("v1", "v2"))
 } finally {
-  spark.catalog.dropTempView("v1")
-  spark.catalog.dropGlobalTempView("v2")
+  spark.catalog.dropGlobalTempView("v1")
+  spark.catalog.dropTempView("v2")
--- End diff --

Ok, done. Please review: https://github.com/apache/spark/pull/20250


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20250: [SPARK-23059][SQL][TEST] Correct some improper wi...

2018-01-12 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/20250

[SPARK-23059][SQL][TEST] Correct some improper with view related method 
usage



## What changes were proposed in this pull request?

Correct some improper with view related method usage
Only change test cases

like:

```
 test("list global temp views") {
try {
  sql("CREATE GLOBAL TEMP VIEW v1 AS SELECT 3, 4")
  sql("CREATE TEMP VIEW v2 AS SELECT 1, 2")

  checkAnswer(sql(s"SHOW TABLES IN $globalTempDB"),
Row(globalTempDB, "v1", true) ::
Row("", "v2", true) :: Nil)

  
assert(spark.catalog.listTables(globalTempDB).collect().toSeq.map(_.name) == 
Seq("v1", "v2"))
} finally {
  spark.catalog.dropTempView("v1")
  spark.catalog.dropGlobalTempView("v2")
}
  }
```
## How was this patch tested?

See test case.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark DropTempViewError

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20250.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20250


commit 8e49445f68db89b8a01b3eeb9c6da74191bc9a86
Author: xubo245 <601450868@...>
Date:   2018-01-12T15:34:32Z

[SPARK-23059][SQL][TEST] Correct some improper with view related method 
usage




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-01-12 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20249
  
@gatorsmile Please review it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION shou...

2018-01-12 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/20249

[SPARK-23057][SPARK-19235][SQL] SET LOCATION should change the path of 
partition in table


## What changes were proposed in this pull request?
Fix error of SE T LOCATION

SET LOCATION should change the path of partition in table

## How was this patch tested?

add test cases:

 test("SPARK-23057: path option always represent the value of table 
location with partition") 
test("SPARK-23057: SET LOCATION for managed table with partition")
test("SPARK-23057: SET LOCATION should change the path of partition in 
table") 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark setPartitionPath

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20249


commit ff21db1cc3ccdef8b1028583ecb11ca0e27c2e7d
Author: xubo245 <601450868@...>
Date:   2018-01-12T14:52:21Z

[SPARK-23057][SPARK-19235][SQL] SET LOCATION should change the path of 
partition in table




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in ...

2018-01-11 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16592#discussion_r160889706
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1082,24 +1173,21 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
   convertToDatasourceTable(catalog, tableIdent)
 }
 
assert(catalog.getTableMetadata(tableIdent).storage.locationUri.isDefined)
-assert(catalog.getTableMetadata(tableIdent).storage.properties.isEmpty)
+
assert(normalizeSerdeProp(catalog.getTableMetadata(tableIdent).storage.properties).isEmpty)
 assert(catalog.getPartition(tableIdent, 
partSpec).storage.locationUri.isDefined)
-assert(catalog.getPartition(tableIdent, 
partSpec).storage.properties.isEmpty)
+assert(
+  normalizeSerdeProp(catalog.getPartition(tableIdent, 
partSpec).storage.properties).isEmpty)
+
 // Verify that the location is set to the expected string
 def verifyLocation(expected: URI, spec: Option[TablePartitionSpec] = 
None): Unit = {
   val storageFormat = spec
 .map { s => catalog.getPartition(tableIdent, s).storage }
 .getOrElse { catalog.getTableMetadata(tableIdent).storage }
-  if (isDatasourceTable) {
-if (spec.isDefined) {
-  assert(storageFormat.properties.isEmpty)
-  assert(storageFormat.locationUri === Some(expected))
-} else {
-  assert(storageFormat.locationUri === Some(expected))
-}
-  } else {
-assert(storageFormat.locationUri === Some(expected))
-  }
+  // TODO(gatorsmile): fix the bug in alter table set location.
+  // if (isUsingHiveMetastore) {
+  //  assert(storageFormat.properties.get("path") === expected)
--- End diff --

Do we need to fix this bug and satify this test case?
When porting these test cases, a bug of SET LOCATION is found. path is not 
set when the location is changed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20195: [SPARK-22972][SQL] Couldn't find corresponding Hive SerD...

2018-01-10 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20195
  
ok @dongjoon-hyun 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20195: [SPARK-22972][SQL] Couldn't find corresponding Hi...

2018-01-10 Thread xubo245

Github user xubo245 closed the pull request at:

https://github.com/apache/spark/pull/20195


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20228: [SPARK-23036] Add withGlobalTempView for testing ...

2018-01-10 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/20228

[SPARK-23036] Add withGlobalTempView for testing and correct some roper 
with view related method usage




## What changes were proposed in this pull request?

Add withGlobalTempView when create global temp view, like withTempView and 
withView.
And correct some improper usage.

## How was this patch tested?

no new test.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark DropTempView

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20228


commit fffd109e8c084f9a4d63840bf761364f1ede5dc9
Author: xubo245 <601450868@...>
Date:   2018-01-11T03:25:17Z

[SPARK-23036] Add withGlobalTempView for testing and correct some improper 
with view related method usage

Add withGlobalTempView when create global temp view, like withTempView and 
withView.
And correct some improper usage.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20227: [SPARK-23035] Fix warning: TEMPORARY TABLE ... US...

2018-01-10 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/20227

[SPARK-23035] Fix warning: TEMPORARY TABLE ... USING ... is deprecated and 
use TempViewAlreadyExistsException when create temp view


Fix warning: TEMPORARY TABLE ... USING ... is deprecated and use 
TempViewAlreadyExistsException when create temp view
There are warning when run test: test("rename temporary view - destination 
table with database name")

Another problem, it throw TempTableAlreadyExistsException and output 
"Temporary table '$table' already exists" when we create temp view by using 
org.apache.spark.sql.catalyst.catalog.GlobalTempViewManager#create, it's 
improper.

## What changes were proposed in this pull request?

Fix some warning by changing "TEMPORARY TABLE ... USING ... " to "TEMPORARY 
VIEW ... USING 
... "

Fix improper information about TempTableAlreadyExistsException when create 
temp view

## How was this patch tested?

use old test cases, such as " test("create temporary view using") "




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark fixDeprecated

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20227.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20227

----
commit b97834a58fb2a0a98eb2645bd9e77e97209b
Author: xubo245 <601450868@...>
Date:   2018-01-11T01:58:48Z

[SPARK-23035] Fix warning: TEMPORARY TABLE ... USING ... is deprecated and 
use TempViewAlreadyExistsException when create temp view

Fix warning: TEMPORARY TABLE ... USING ... is deprecated and use 
TempViewAlreadyExistsException when create temp view
There are warning when run test: test("rename temporary view - destination 
table with database name")

Another problem, it throw TempTableAlreadyExistsException and output 
"Temporary table '$table' already exists" when we create temp view by using 
org.apache.spark.sql.catalyst.catalog.GlobalTempViewManager#create, it's 
improper.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20195: [SPARK-22972] Couldn't find corresponding Hive SerDe for...

2018-01-08 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20195
  
I  submit a separate PR for 2.2 in here, please review it. @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20195: [SPARK-22972] Couldn't find corresponding Hive Se...

2018-01-08 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/20195

[SPARK-22972] Couldn't find corresponding Hive SerDe for data source 
provider org.apache.spark.sql.hive.orc


## What changes were proposed in this pull request?

Fix the warning: Couldn't find corresponding Hive SerDe for data source 
provider org.apache.spark.sql.hive.orc.
For branch-2.2, it cherry-pick from 
https://github.com/apache/spark/commit/8032cf852fccd0ab8754f633affdc9ba8fc99e58

## How was this patch tested?

 test("SPARK-22972: hive orc source")


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark HiveSerDeForBranch2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20195


commit c65f3efd6270adc5c8708e100263379758fd5d82
Author: xubo245 <601450868@...>
Date:   2018-01-09T02:15:01Z

[SPARK-22972] Couldn't find corresponding Hive SerDe for data source 
provider org.apache.spark.sql.hive.orc

Fix the warning: Couldn't find corresponding Hive SerDe for data source 
provider org.apache.spark.sql.hive.orc.
For branch-2.2, it cherry-pick from 
https://github.com/apache/spark/commit/8032cf852fccd0ab8754f633affdc9ba8fc99e58

 test("SPARK-22972: hive orc source")
assert(HiveSerDe.sourceToSerDe("org.apache.spark.sql.hive.orc")
  .equals(HiveSerDe.sourceToSerDe("orc")))

Author: xubo245 <601450...@qq.com>

Closes #20165 from xubo245/HiveSerDe.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20165: [SPARK-22972] Couldn't find corresponding Hive SerDe for...

2018-01-08 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20165
  
Thank you too.
Ok, I will raise a PR for this later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...

2018-01-08 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20165#discussion_r160135992
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala 
---
@@ -62,6 +64,33 @@ class HiveOrcSourceSuite extends OrcSuite with 
TestHiveSingleton {
""".stripMargin)
   }
 
+  test("SPARK-22972: hive orc source") {
+val tableName = "normal_orc_as_source_hive"
+withTable(tableName) {
+
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...

2018-01-08 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20165#discussion_r160136011
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala 
---
@@ -62,6 +64,33 @@ class HiveOrcSourceSuite extends OrcSuite with 
TestHiveSingleton {
""".stripMargin)
   }
 
+  test("SPARK-22972: hive orc source") {
+val tableName = "normal_orc_as_source_hive"
+withTable(tableName) {
+
+  sql(
+s"""CREATE TABLE $tableName
+   |USING org.apache.spark.sql.hive.orc
+   |OPTIONS (
+   |  PATH '${new File(orcTableAsDir.getAbsolutePath).toURI}'
+   |)
+   """.stripMargin)
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...

2018-01-07 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20165#discussion_r160067418
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala 
---
@@ -62,6 +63,22 @@ class HiveOrcSourceSuite extends OrcSuite with 
TestHiveSingleton {
""".stripMargin)
   }
 
+  test("SPARK-22972: hive orc source") {
+spark.sql(
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...

2018-01-07 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20165#discussion_r160067425
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala 
---
@@ -62,6 +63,22 @@ class HiveOrcSourceSuite extends OrcSuite with 
TestHiveSingleton {
""".stripMargin)
   }
 
+  test("SPARK-22972: hive orc source") {
+spark.sql(
+  s"""CREATE TABLE normal_orc_as_source_hive
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...

2018-01-07 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20165#discussion_r160067384
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala 
---
@@ -62,6 +63,22 @@ class HiveOrcSourceSuite extends OrcSuite with 
TestHiveSingleton {
""".stripMargin)
   }
 
+  test("SPARK-22972: hive orc source") {
+spark.sql(
+  s"""CREATE TABLE normal_orc_as_source_hive
+ |USING org.apache.spark.sql.hive.orc
+ |OPTIONS (
+ |  PATH '${new File(orcTableAsDir.getAbsolutePath).toURI}'
+ |)
+   """.stripMargin)
+spark.sql("desc formatted normal_orc_as_source_hive").show()
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...

2018-01-05 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20165#discussion_r159879314
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala 
---
@@ -62,6 +63,22 @@ class HiveOrcSourceSuite extends OrcSuite with 
TestHiveSingleton {
""".stripMargin)
   }
 
+  test("SPARK-22972: hive orc source") {
+spark.sql(
+  s"""CREATE TABLE normal_orc_as_source_hive
+ |USING org.apache.spark.sql.hive.orc
+ |OPTIONS (
+ |  PATH '${new File(orcTableAsDir.getAbsolutePath).toURI}'
+ |)
+   """.
+stripMargin)
+spark.sql(
+  "desc formatted normal_orc_as_source_hive").show()
+checkAnswer(sql("SELECT COUNT(*) FROM normal_orc_as_source_hive"), 
Row(10))
+assert(HiveSerDe.sourceToSerDe("org.apache.spark.sql.hive.orc")
+  .equals(HiveSerDe.sourceToSerDe("orc")))
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...

2018-01-05 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20165#discussion_r159879115
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala 
---
@@ -62,6 +63,22 @@ class HiveOrcSourceSuite extends OrcSuite with 
TestHiveSingleton {
""".stripMargin)
   }
 
+  test("SPARK-22972: hive orc source") {
+spark.sql(
+  s"""CREATE TABLE normal_orc_as_source_hive
+ |USING org.apache.spark.sql.hive.orc
+ |OPTIONS (
+ |  PATH '${new File(orcTableAsDir.getAbsolutePath).toURI}'
+ |)
+   """.
+stripMargin)
+spark.sql(
+  "desc formatted normal_orc_as_source_hive").show()
--- End diff --

change it to spark.sql("desc formatted normal_orc_as_source_hive").show(), 
is it ok?

How to get the warning and verify it in code?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...

2018-01-05 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20165#discussion_r159878689
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala 
---
@@ -62,6 +63,22 @@ class HiveOrcSourceSuite extends OrcSuite with 
TestHiveSingleton {
""".stripMargin)
   }
 
+  test("SPARK-22972: hive orc source") {
+spark.sql(
+  s"""CREATE TABLE normal_orc_as_source_hive
+ |USING org.apache.spark.sql.hive.orc
+ |OPTIONS (
+ |  PATH '${new File(orcTableAsDir.getAbsolutePath).toURI}'
+ |)
+   """.
+stripMargin)
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...

2018-01-05 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20165#discussion_r159878683
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/HiveSerDe.scala ---
@@ -72,7 +72,7 @@ object HiveSerDe {
   def sourceToSerDe(source: String): Option[HiveSerDe] = {
 val key = source.toLowerCase(Locale.ROOT) match {
   case s if s.startsWith("org.apache.spark.sql.parquet") => "parquet"
-  case s if s.startsWith("org.apache.spark.sql.orc") => "orc"
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20165: [SPARK-22972] Couldn't find corresponding Hive Se...

2018-01-05 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/20165

[SPARK-22972] Couldn't find corresponding Hive SerDe for data source 
provider org.apache.spark.sql.hive.orc


## What changes were proposed in this pull request?
Fix the warning: Couldn't find corresponding Hive SerDe for data source 
provider org.apache.spark.sql.hive.orc.
(Please fill in changes proposed in this fix)

## How was this patch tested?
 test("SPARK-22972: hive orc source") 
assert(HiveSerDe.sourceToSerDe("org.apache.spark.sql.hive.orc")
  .equals(HiveSerDe.sourceToSerDe("orc")))

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark HiveSerDe

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20165.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20165


commit fa902d6d3fb635236ac01ee5b43470359f16cfdd
Author: xubo245 <601450868@...>
Date:   2018-01-05T13:20:53Z

[SPARK-22972] Couldn't find corresponding Hive SerDe for data source 
provider org.apache.spark.sql.hive.orc.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20044: [SPARK-22857] Optimize code by inspecting code

2017-12-25 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20044
  
@srowen  I find all related array size warning, 

![4](https://user-images.githubusercontent.com/8759816/34340165-ccdb3c7a-e9b9-11e7-827f-484283ce97f1.PNG)

are there any other issues?







---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20044: [SPARK-22857] Optimize code by inspecting code

2017-12-21 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20044
  
Ok, I remove some changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20044: [SPARK-22857] Optimize code by inspecting code

2017-12-21 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20044
  
ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20044: [SPARK-22857] Optimize code by inspecting code

2017-12-21 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/20044

[SPARK-22857] Optimize code by inspecting code

Optimize code by inspecting code, including:
remove some unused import
change invoking array size method to length method
use head method

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark spark2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20044.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20044


commit aab41182a333b6f6b3e58624f896dfa668d75842
Author: xubo245 <601450868@...>
Date:   2017-11-27T12:26:58Z

[SPARK-22857] Optimize code by inspecting code

Optimize code by inspecting code, including:
remove some unused import
change invoking array size method to length method
use head method




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19639: [SPARK-22423][SQL] Scala test source files like TestHive...

2017-11-03 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/19639
  
I have fixed all four instances and update the PR title.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19639: [SPARK-22423][SQL] The TestHiveSingleton.scala fi...

2017-11-03 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19639#discussion_r148764458
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHiveSingleton.scala 
---
@@ -24,7 +24,6 @@ import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.hive.HiveExternalCatalog
 import org.apache.spark.sql.hive.client.HiveClient
 
-
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19639: [SPARK-22423][SQL] The TestHiveSingleton.scala file shou...

2017-11-03 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/19639
  
Ok, I will fix all and update the PR title Later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19639: [SPARK-22423][SQL] The TestHiveSingleton.scala fi...

2017-11-02 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/19639

[SPARK-22423][SQL] The TestHiveSingleton.scala file should be in scala 
directory


## What changes were proposed in this pull request?

  The TestHiveSingleton.scala file is moved into scala directory from java 
directory

## How was this patch tested?

Base test class for Hive.  
No new test case in this PR.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark scalaDirectory

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19639


commit bfda9345c618a9656ccbe7b472e9e1963d325b45
Author: xubo245 <601450...@qq.com>
Date:   2017-11-02T07:30:52Z

[SPARK-22423][SQL] The TestHiveSingleton.scala file should be in scala 
directory




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14422: Add rand(numRows: Int, numCols: Int) functions

2016-08-09 Thread xubo245

Github user xubo245 closed the pull request at:

https://github.com/apache/spark/pull/14422


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14422: Add rand(numRows: Int, numCols: Int) functions

2016-08-09 Thread xubo245

GitHub user xubo245 reopened a pull request:

https://github.com/apache/spark/pull/14422

Add rand(numRows: Int, numCols: Int) functions

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)


add rand(numRows: Int, numCols: Int) functions to DenseMatrix object,like 
breeze.linalg.DenseMatrix.rand()

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14422.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14422


commit a7a1261f52112a3bca375dd0bed1c1bc0a2e0ed8
Author: å¾æ³¢ <601450...@qq.com>
Date:   2016-07-30T15:43:36Z

Add rand(numRows: Int, numCols: Int) functions

add rand(numRows: Int, numCols: Int) functions to DenseMatrix object,like 
breeze.linalg.DenseMatrix.rand()

commit 054b70ccce73c02cce04caf9f7958cfc555df829
Author: å¾æ³¢ <601450...@qq.com>
Date:   2016-07-30T16:36:30Z

fix RNG 

fix RNG , his makes a new RNG for All element




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14422: Add rand(numRows: Int, numCols: Int) functions

2016-08-09 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/14422
  
ok



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14422: Add rand(numRows: Int, numCols: Int) functions

2016-08-09 Thread xubo245

Github user xubo245 closed the pull request at:

https://github.com/apache/spark/pull/14422


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14422: Add rand(numRows: Int, numCols: Int) functions

2016-08-07 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/14422
  
@srowen sorry, please close the issue. I will learning more  before next 
PR. The PR is only because breeze have the function. In spark ,there is no use 
for them.
Could you tell me some issue for starter? Please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14422: Add rand(numRows: Int, numCols: Int) functions

2016-07-30 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/14422
  
@HyukjinKwon Thank you. 
This is my first time to push request to spark, Sorrry, I will follow the 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark  later.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14424: Add test:DenseMatrix.rand with no rng

2016-07-30 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/14424
  
It add for https://github.com/apache/spark/pull/14422


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14422: Add rand(numRows: Int, numCols: Int) functions

2016-07-30 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/14422
  
I add test : https://github.com/apache/spark/pull/14424



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14424: Add test:DenseMatrix.rand with no rng

2016-07-30 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/14424

Add test:DenseMatrix.rand with no rng

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)


Add test:DenseMatrix.rand with no rng

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark patch-3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14424.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14424


commit 2e945479af45fdadd8abb4529173db04226de64e
Author: å¾æ³¢ <601450...@qq.com>
Date:   2016-07-30T18:00:01Z

Add test:DenseMatrix.rand with no rng

Add test:DenseMatrix.rand with no rng




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14422: Add rand(numRows: Int, numCols: Int) functions

2016-07-30 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14422#discussion_r72890434
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -497,6 +497,20 @@ object DenseMatrix {
   }
 
   /**
+* Generate a `DenseMatrix` consisting of `i.i.d.` uniform random 
numbers.
+* 
+* @param numRows number of rows of the matrix
+* @param numCols number of columns of the matrix
+* @return DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+*/
+  @Since("2.0.0")
+  def rand(numRows: Int, numCols: Int): DenseMatrix = {
+require(numRows.toLong * numCols <= Int.MaxValue,
+  s"$numRows x $numCols dense matrix is too large to allocate")
+new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)((new 
Random).nextDouble()))
--- End diff --

Can fix RNG, This makes a new RNG for all element :
val rng = new Random()
new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14423: Add zeros(size: Int) function

2016-07-30 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/14423

Add zeros(size: Int) function

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)


Generate a `DenseVector` consisting of zeros.
It can replace breeze.linalg.DenseVector#zeros[Double]

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14423.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14423


commit b111cdb9236c87af18e1ea773cea72b73fc68561
Author: å¾æ³¢ <601450...@qq.com>
Date:   2016-07-30T16:30:47Z

Add zeros(size: Int) function

Generate a `DenseVector` consisting of zeros.
It can replace breeze.linalg.DenseVector#zeros[Double]




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14422: Add rand(numRows: Int, numCols: Int) functions

2016-07-30 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/14422
  
we can use it to replacebreeze.linalg.DenseMatrix.rand(numRows: Int, 
numCols: Int)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14422: Add rand(numRows: Int, numCols: Int) functions

2016-07-30 Thread xubo245

GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/14422

Add rand(numRows: Int, numCols: Int) functions

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)


add rand(numRows: Int, numCols: Int) functions to DenseMatrix object,like 
breeze.linalg.DenseMatrix.rand()

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14422.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14422


commit a7a1261f52112a3bca375dd0bed1c1bc0a2e0ed8
Author: å¾æ³¢ <601450...@qq.com>
Date:   2016-07-30T15:43:36Z

Add rand(numRows: Int, numCols: Int) functions

add rand(numRows: Int, numCols: Int) functions to DenseMatrix object,like 
breeze.linalg.DenseMatrix.rand()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

95 matches

Mail list logo