[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-13 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r150618779 --- Diff: python/pyspark/sql/session.py --- @@ -438,28 +438,70 @@ def _get_numpy_record_dtypes(self, rec): curr_type =

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-12 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19459 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r150445421 --- Diff: python/pyspark/sql/session.py --- @@ -438,28 +438,70 @@ def _get_numpy_record_dtypes(self, rec): curr_type =

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r150324146 --- Diff: python/pyspark/sql/session.py --- @@ -454,13 +454,60 @@ def _convert_from_pandas(self, pdf, schema): # Check if any columns

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r150321176 --- Diff: python/pyspark/sql/session.py --- @@ -454,13 +454,60 @@ def _convert_from_pandas(self, pdf, schema): # Check if any columns

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r150314957 --- Diff: python/pyspark/sql/session.py --- @@ -454,13 +454,60 @@ def _convert_from_pandas(self, pdf, schema): # Check if any columns

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r150308054 --- Diff: python/pyspark/serializers.py --- @@ -225,11 +232,11 @@ def _create_batch(series): # If a nullable integer series has been promoted

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r150305976 --- Diff: python/pyspark/serializers.py --- @@ -225,11 +232,11 @@ def _create_batch(series): # If a nullable integer series has been promoted

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r150228738 --- Diff: python/pyspark/sql/session.py --- @@ -454,13 +454,60 @@ def _convert_from_pandas(self, pdf, schema): # Check if any columns

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r150229857 --- Diff: python/pyspark/sql/tests.py --- @@ -3180,6 +3185,58 @@ def test_filtered_frame(self): self.assertEqual(pdf.columns[0], "i")

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r150229941 --- Diff: python/pyspark/sql/tests.py --- @@ -3180,6 +3185,58 @@ def test_filtered_frame(self): self.assertEqual(pdf.columns[0], "i")

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r150227714 --- Diff: python/pyspark/sql/session.py --- @@ -454,13 +454,60 @@ def _convert_from_pandas(self, pdf, schema): # Check if any columns

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r150228512 --- Diff: python/pyspark/sql/session.py --- @@ -454,13 +454,60 @@ def _convert_from_pandas(self, pdf, schema): # Check if any columns

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r150206108 --- Diff: python/pyspark/serializers.py --- @@ -225,11 +232,11 @@ def _create_batch(series): # If a nullable integer series has been promoted

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-09 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r149887860 --- Diff: python/pyspark/serializers.py --- @@ -214,6 +214,14 @@ def __repr__(self): def _create_batch(series): +""" +Create

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-08 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r149886093 --- Diff: python/pyspark/serializers.py --- @@ -213,7 +213,15 @@ def __repr__(self): return "ArrowSerializer" -def

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-08 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r149874063 --- Diff: python/pyspark/serializers.py --- @@ -213,7 +213,15 @@ def __repr__(self): return "ArrowSerializer" -def

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-08 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r149871432 --- Diff: python/pyspark/serializers.py --- @@ -213,7 +213,15 @@ def __repr__(self): return "ArrowSerializer" -def

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-08 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r149760058 --- Diff: python/pyspark/serializers.py --- @@ -213,7 +213,15 @@ def __repr__(self): return "ArrowSerializer" -def

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-08 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r149626358 --- Diff: python/pyspark/serializers.py --- @@ -213,7 +213,15 @@ def __repr__(self): return "ArrowSerializer" -def

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-24 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r146619813 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,52 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r146436616 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,52 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-23 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r146431078 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,73 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-23 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r146421804 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,73 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-23 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r146421602 --- Diff: python/pyspark/sql/session.py --- @@ -510,6 +578,12 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-23 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r146421298 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,73 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145865969 --- Diff: python/pyspark/sql/session.py --- @@ -510,6 +578,12 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145863796 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,73 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145862488 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,73 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145859471 --- Diff: python/pyspark/sql/session.py --- @@ -510,6 +578,12 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145858362 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,73 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-19 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145796514 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,73 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-19 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145611544 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,73 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-18 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145603645 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,73 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-18 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145490665 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,39 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145336060 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,39 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-18 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145334576 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,39 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145294433 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145293860 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,39 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145293209 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,39 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145292822 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,39 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-17 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145291702 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,39 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-17 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145200454 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145039356 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-17 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145035994 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-17 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145034007 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145032365 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145032174 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145029289 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145029049 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145028238 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144994355 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144945183 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144931496 --- Diff: python/pyspark/sql/types.py --- @@ -1624,6 +1624,50 @@ def to_arrow_type(dt): return arrow_type +def

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144931384 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144931295 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/python/PythonSQLUtils.scala --- @@ -29,4 +32,19 @@ private[sql] object PythonSQLUtils {

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144931022 --- Diff: python/pyspark/sql/dataframe.py --- @@ -70,12 +70,12 @@ class DataFrame(object): .. versionadded:: 1.3 """ -

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144930424 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144926065 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144828405 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144827985 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144829187 --- Diff: python/pyspark/sql/types.py --- @@ -1624,6 +1624,50 @@ def to_arrow_type(dt): return arrow_type +def

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144750565 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/python/PythonSQLUtils.scala --- @@ -29,4 +32,19 @@ private[sql] object PythonSQLUtils {

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144750372 --- Diff: python/pyspark/sql/dataframe.py --- @@ -70,12 +70,12 @@ class DataFrame(object): .. versionadded:: 1.3 """ -

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-14 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144706910 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-14 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144706853 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-14 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144706672 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144618995 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-13 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144601470 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144600676 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-13 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144599111 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-13 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144598051 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-13 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144597485 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-13 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144594930 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144340072 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144348462 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144339301 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-11 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143906469 --- Diff: python/pyspark/sql/tests.py --- @@ -3095,16 +3095,32 @@ def setUpClass(cls): StructField("3_long_t", LongType(), True),

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-11 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144194374 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-11 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144194084 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143821657 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala --- @@ -203,4 +205,16 @@ private[sql] object

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143788090 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-09 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143633890 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143610100 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala --- @@ -203,4 +205,16 @@ private[sql] object

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-09 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143607522 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala --- @@ -203,4 +205,16 @@ private[sql] object

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-09 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143606693 --- Diff: python/pyspark/sql/tests.py --- @@ -3147,6 +3150,14 @@ def test_filtered_frame(self): self.assertEqual(pdf.columns[0], "i")

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-09 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143605840 --- Diff: python/pyspark/sql/tests.py --- @@ -3147,6 +3150,14 @@ def test_filtered_frame(self): self.assertEqual(pdf.columns[0], "i")

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143600411 --- Diff: python/pyspark/sql/tests.py --- @@ -3147,6 +3150,14 @@ def test_filtered_frame(self): self.assertEqual(pdf.columns[0], "i")

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-09 Thread BryanCutler
GitHub user BryanCutler opened a pull request: https://github.com/apache/spark/pull/19459 [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFrame from Pandas ## What changes were proposed in this pull request? This change uses Arrow to optimize the creation of a Spark