[GitHub] spark pull request #13389: [SPARK-9876][SQL][FOLLOWUP] Enable string and bin...

2016-07-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13389


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13389: [SPARK-9876][SQL][FOLLOWUP] Enable string and bin...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13389#discussion_r69527639
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystWriteSupport.scala
 ---
@@ -150,7 +150,8 @@ private[parquet] class CatalystWriteSupport extends 
WriteSupport[InternalRow] wi
 
   case StringType =>
 (row: SpecializedGetters, ordinal: Int) =>
-  
recordConsumer.addBinary(Binary.fromByteArray(row.getUTF8String(ordinal).getBytes))
+  recordConsumer.addBinary(
+
Binary.fromReusedByteArray(row.getUTF8String(ordinal).getBytes))
--- End diff --

`UTF8String` itself is immutable, but the underlying buffer it points to 
can be mutable. I'd vote for using `Binary.fromReusedByteArray` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13389: [SPARK-9876][SQL][FOLLOWUP] Enable string and bin...

2016-07-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13389#discussion_r69280087
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystWriteSupport.scala
 ---
@@ -150,7 +150,8 @@ private[parquet] class CatalystWriteSupport extends 
WriteSupport[InternalRow] wi
 
   case StringType =>
 (row: SpecializedGetters, ordinal: Int) =>
-  
recordConsumer.addBinary(Binary.fromByteArray(row.getUTF8String(ordinal).getBytes))
+  recordConsumer.addBinary(
+
Binary.fromReusedByteArray(row.getUTF8String(ordinal).getBytes))
--- End diff --

Thank you for your review! (Actually it is `UTF8String`. So, it has to be 
converted into `String` to use `Binary.fromString`).. though.. I am a bit 
worried that it might possibly be reused in the future (although I think it is 
not reused for now). 

This can write corrupt statistics if this is reused.. Is my understanding 
correct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13389: [SPARK-9876][SQL][FOLLOWUP] Enable string and bin...

2016-06-30 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/13389#discussion_r69158244
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystWriteSupport.scala
 ---
@@ -150,7 +150,8 @@ private[parquet] class CatalystWriteSupport extends 
WriteSupport[InternalRow] wi
 
   case StringType =>
 (row: SpecializedGetters, ordinal: Int) =>
-  
recordConsumer.addBinary(Binary.fromByteArray(row.getUTF8String(ordinal).getBytes))
+  recordConsumer.addBinary(
+
Binary.fromReusedByteArray(row.getUTF8String(ordinal).getBytes))
--- End diff --

If you're converting from a `String`, you can use `Binary.fromString`, 
which sets the reuse flag correctly. It looks like this should be using the 
"constant" variant, which signals to Parquet that the underlying bytes won't be 
changed by the program after they have been passed to Parquet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org