[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42711166
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -58,6 +58,10 @@ private void zeroOutPaddingBytes(int numBytes) {
 }
   }
 
+  public boolean isNullAt(int ordinal) {
--- End diff --

This is needed by larger Decimal, to call write(decimal) after setNullAt()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42711131
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -95,41 +99,75 @@ public void alignToWords(int numBytes) {
 }
   }
 
-  public void writeCompactDecimal(int ordinal, Decimal input, int 
precision, int scale) {
-// make sure Decimal object has the same scale as DecimalType
-if (input.changePrecision(precision, scale)) {
-  Platform.putLong(holder.buffer, getFieldOffset(ordinal), 
input.toUnscaledLong());
-} else {
-  setNullAt(ordinal);
-}
+  public void write(int ordinal, boolean value) {
+Platform.putBoolean(holder.buffer, getFieldOffset(ordinal), value);
--- End diff --

Thanks, will fix it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-149998301
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-149998334
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-15251
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-15256
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44083/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-15244
  
**[Test build #44083 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44083/consoleFull)**
 for PR 9203 at commit 
[`15ebb3b`](https://github.com/apache/spark/commit/15ebb3b3794d6888e9e08a834511f66ec3934f00).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`abstract class ColumnarIterator extends Iterator[InternalRow] `\n  * `class 
MutableUnsafeRow(val writer: UnsafeRowWriter) extends GenericMutableRow(null) 
`\n  * `  class SpecificColumnarIterator extends $`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-14732
  
**[Test build #44083 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44083/consoleFull)**
 for PR 9203 at commit 
[`15ebb3b`](https://github.com/apache/spark/commit/15ebb3b3794d6888e9e08a834511f66ec3934f00).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150016227
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/9203

[SPARK-11243] [SQL] output UnsafeRow from columnar cache



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark unsafe_cache

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9203.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9203


commit 15ebb3b3794d6888e9e08a834511f66ec3934f00
Author: Davies Liu 
Date:   2015-10-21T19:02:17Z

output UnsafeRow from columnar cache




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150019732
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150019702
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150020994
  
**[Test build #44092 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44092/consoleFull)**
 for PR 9203 at commit 
[`e33170d`](https://github.com/apache/spark/commit/e33170d793993bcfc48bdc0c159b3fe51b4f0861).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42684787
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -95,41 +99,75 @@ public void alignToWords(int numBytes) {
 }
   }
 
-  public void writeCompactDecimal(int ordinal, Decimal input, int 
precision, int scale) {
-// make sure Decimal object has the same scale as DecimalType
-if (input.changePrecision(precision, scale)) {
-  Platform.putLong(holder.buffer, getFieldOffset(ordinal), 
input.toUnscaledLong());
-} else {
-  setNullAt(ordinal);
-}
+  public void write(int ordinal, boolean value) {
+Platform.putBoolean(holder.buffer, getFieldOffset(ordinal), value);
   }
 
-  public void write(int ordinal, Decimal input, int precision, int scale) {
-// grow the global buffer before writing data.
-holder.grow(16);
+  public void write(int ordinal, byte value) {
+Platform.putByte(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-// zero-out the bytes
-Platform.putLong(holder.buffer, holder.cursor, 0L);
-Platform.putLong(holder.buffer, holder.cursor + 8, 0L);
+  public void write(int ordinal, short value) {
+Platform.putShort(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-// Make sure Decimal object has the same scale as DecimalType.
-// Note that we may pass in null Decimal object to set null for it.
-if (input == null || !input.changePrecision(precision, scale)) {
-  BitSetMethods.set(holder.buffer, startingOffset, ordinal);
-  // keep the offset for future update
-  setOffsetAndSize(ordinal, 0L);
-} else {
-  final byte[] bytes = 
input.toJavaBigDecimal().unscaledValue().toByteArray();
-  assert bytes.length <= 16;
+  public void write(int ordinal, int value) {
+Platform.putInt(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-  // Write the bytes to the variable length portion.
-  Platform.copyMemory(
-bytes, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, 
bytes.length);
-  setOffsetAndSize(ordinal, bytes.length);
+  public void write(int ordinal, long value) {
+Platform.putLong(holder.buffer, getFieldOffset(ordinal), value);
+  }
+
+  public void write(int ordinal, float value) {
+if (Float.isNaN(value)) {
+  value = Float.NaN;
 }
+Platform.putFloat(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-// move the cursor forward.
-holder.cursor += 16;
+  public void write(int ordinal, double value) {
+if (Double.isNaN(value)) {
+  value = Double.NaN;
+}
+Platform.putDouble(holder.buffer, getFieldOffset(ordinal), value);
+  }
+
+  public void write(int ordinal, Decimal input, int precision, int scale) {
+if (precision <= Decimal.MAX_LONG_DIGITS()) {
+  // make sure Decimal object has the same scale as DecimalType
+  if (input.changePrecision(precision, scale)) {
--- End diff --

as discussed offline, maybe change this to assert?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42686848
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ---
@@ -618,6 +618,25 @@ public void writeTo(ByteBuffer buffer) {
 buffer.position(pos + sizeInBytes);
   }
 
+  /**
+   * Write the bytes of var-length field into ByteBuffer
+   */
+  public void writeFieldTo(int ordinal, ByteBuffer buffer) {
--- End diff --

should document this only supports HeapByteBuffer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150016965
  
**[Test build #44089 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44089/consoleFull)**
 for PR 9203 at commit 
[`81e3fad`](https://github.com/apache/spark/commit/81e3fad9462eed6d7b32b20002bf1127dd435f18).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150016246
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42686721
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ---
@@ -618,6 +618,25 @@ public void writeTo(ByteBuffer buffer) {
 buffer.position(pos + sizeInBytes);
   }
 
+  /**
+   * Write the bytes of var-length field into ByteBuffer
+   */
+  public void writeFieldTo(int ordinal, ByteBuffer buffer) {
+final long offsetAndSize = getLong(ordinal);
+final int offset = (int) (offsetAndSize >> 32);
+final int size = (int) (offsetAndSize & ((1L << 32) - 1));
--- End diff --

can we create a constant for (1L << 32) - 1?

We have a lot of them now in this file. LOWER_32_BIT_MASK?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150034498
  
I'm really excited to see how much of a performance difference this makes 
when scanning string columns, since this could potentially provide a big perf. 
boost to the sc.textFile replacement that I showed the other day.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42687247
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -58,6 +58,10 @@ private void zeroOutPaddingBytes(int numBytes) {
 }
   }
 
+  public boolean isNullAt(int ordinal) {
+return BitSetMethods.isSet(holder.buffer, startingOffset, ordinal);
--- End diff --

Do we ever start from the middle of an existing buffer? If not, can we move 
startingOffset into holder class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150041762
  
**[Test build #44089 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44089/consoleFull)**
 for PR 9203 at commit 
[`81e3fad`](https://github.com/apache/spark/commit/81e3fad9462eed6d7b32b20002bf1127dd435f18).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class MutableUnsafeRow(val writer: UnsafeRowWriter) extends 
GenericMutableRow(null) `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150041882
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150041884
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44089/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42695693
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -95,41 +99,75 @@ public void alignToWords(int numBytes) {
 }
   }
 
-  public void writeCompactDecimal(int ordinal, Decimal input, int 
precision, int scale) {
-// make sure Decimal object has the same scale as DecimalType
-if (input.changePrecision(precision, scale)) {
-  Platform.putLong(holder.buffer, getFieldOffset(ordinal), 
input.toUnscaledLong());
-} else {
-  setNullAt(ordinal);
-}
+  public void write(int ordinal, boolean value) {
+Platform.putBoolean(holder.buffer, getFieldOffset(ordinal), value);
   }
 
-  public void write(int ordinal, Decimal input, int precision, int scale) {
-// grow the global buffer before writing data.
-holder.grow(16);
+  public void write(int ordinal, byte value) {
+Platform.putByte(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-// zero-out the bytes
-Platform.putLong(holder.buffer, holder.cursor, 0L);
-Platform.putLong(holder.buffer, holder.cursor + 8, 0L);
+  public void write(int ordinal, short value) {
+Platform.putShort(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-// Make sure Decimal object has the same scale as DecimalType.
-// Note that we may pass in null Decimal object to set null for it.
-if (input == null || !input.changePrecision(precision, scale)) {
-  BitSetMethods.set(holder.buffer, startingOffset, ordinal);
-  // keep the offset for future update
-  setOffsetAndSize(ordinal, 0L);
-} else {
-  final byte[] bytes = 
input.toJavaBigDecimal().unscaledValue().toByteArray();
-  assert bytes.length <= 16;
+  public void write(int ordinal, int value) {
+Platform.putInt(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-  // Write the bytes to the variable length portion.
-  Platform.copyMemory(
-bytes, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, 
bytes.length);
-  setOffsetAndSize(ordinal, bytes.length);
+  public void write(int ordinal, long value) {
+Platform.putLong(holder.buffer, getFieldOffset(ordinal), value);
+  }
+
+  public void write(int ordinal, float value) {
+if (Float.isNaN(value)) {
+  value = Float.NaN;
 }
+Platform.putFloat(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-// move the cursor forward.
-holder.cursor += 16;
+  public void write(int ordinal, double value) {
+if (Double.isNaN(value)) {
+  value = Double.NaN;
+}
+Platform.putDouble(holder.buffer, getFieldOffset(ordinal), value);
+  }
+
+  public void write(int ordinal, Decimal input, int precision, int scale) {
+if (precision <= Decimal.MAX_LONG_DIGITS()) {
+  // make sure Decimal object has the same scale as DecimalType
+  if (input.changePrecision(precision, scale)) {
--- End diff --

will do in separate PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42695679
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ---
@@ -618,6 +618,25 @@ public void writeTo(ByteBuffer buffer) {
 buffer.position(pos + sizeInBytes);
   }
 
+  /**
+   * Write the bytes of var-length field into ByteBuffer
+   */
+  public void writeFieldTo(int ordinal, ByteBuffer buffer) {
+final long offsetAndSize = getLong(ordinal);
+final int offset = (int) (offsetAndSize >> 32);
+final int size = (int) (offsetAndSize & ((1L << 32) - 1));
--- End diff --

Actually, there are not needed, `(int) offsetAndSize` should be enough


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42687468
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -95,41 +99,75 @@ public void alignToWords(int numBytes) {
 }
   }
 
-  public void writeCompactDecimal(int ordinal, Decimal input, int 
precision, int scale) {
-// make sure Decimal object has the same scale as DecimalType
-if (input.changePrecision(precision, scale)) {
-  Platform.putLong(holder.buffer, getFieldOffset(ordinal), 
input.toUnscaledLong());
-} else {
-  setNullAt(ordinal);
-}
+  public void write(int ordinal, boolean value) {
+Platform.putBoolean(holder.buffer, getFieldOffset(ordinal), value);
   }
 
-  public void write(int ordinal, Decimal input, int precision, int scale) {
-// grow the global buffer before writing data.
-holder.grow(16);
+  public void write(int ordinal, byte value) {
+Platform.putByte(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-// zero-out the bytes
-Platform.putLong(holder.buffer, holder.cursor, 0L);
-Platform.putLong(holder.buffer, holder.cursor + 8, 0L);
+  public void write(int ordinal, short value) {
+Platform.putShort(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-// Make sure Decimal object has the same scale as DecimalType.
-// Note that we may pass in null Decimal object to set null for it.
-if (input == null || !input.changePrecision(precision, scale)) {
-  BitSetMethods.set(holder.buffer, startingOffset, ordinal);
-  // keep the offset for future update
-  setOffsetAndSize(ordinal, 0L);
-} else {
-  final byte[] bytes = 
input.toJavaBigDecimal().unscaledValue().toByteArray();
-  assert bytes.length <= 16;
+  public void write(int ordinal, int value) {
+Platform.putInt(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-  // Write the bytes to the variable length portion.
-  Platform.copyMemory(
-bytes, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, 
bytes.length);
-  setOffsetAndSize(ordinal, bytes.length);
+  public void write(int ordinal, long value) {
+Platform.putLong(holder.buffer, getFieldOffset(ordinal), value);
+  }
+
+  public void write(int ordinal, float value) {
+if (Float.isNaN(value)) {
+  value = Float.NaN;
 }
+Platform.putFloat(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-// move the cursor forward.
-holder.cursor += 16;
+  public void write(int ordinal, double value) {
+if (Double.isNaN(value)) {
+  value = Double.NaN;
+}
+Platform.putDouble(holder.buffer, getFieldOffset(ordinal), value);
+  }
+
+  public void write(int ordinal, Decimal input, int precision, int scale) {
+if (precision <= Decimal.MAX_LONG_DIGITS()) {
+  // make sure Decimal object has the same scale as DecimalType
+  if (input.changePrecision(precision, scale)) {
+Platform.putLong(holder.buffer, getFieldOffset(ordinal), 
input.toUnscaledLong());
+  } else {
+setNullAt(ordinal);
+  }
+} else {
+  // grow the global buffer before writing data.
+  holder.grow(16);
+
+  // zero-out the bytes
+  Platform.putLong(holder.buffer, holder.cursor, 0L);
+  Platform.putLong(holder.buffer, holder.cursor + 8, 0L);
+
+  // Make sure Decimal object has the same scale as DecimalType.
+  // Note that we may pass in null Decimal object to set null for it.
+  if (input == null || !input.changePrecision(precision, scale)) {
--- End diff --

assert the 2nd part?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42688421
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/columnar/GenerateColumnAccessor.scala
 ---
@@ -20,18 +20,42 @@ package org.apache.spark.sql.columnar
 import org.apache.spark.Logging
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.catalyst.expressions.codegen.{CodeFormatter, 
CodeGenerator}
+import org.apache.spark.sql.catalyst.expressions.codegen.{UnsafeRowWriter, 
CodeFormatter, CodeGenerator}
 import org.apache.spark.sql.types._
 
 /**
- * An Iterator to walk throught the InternalRows from a CachedBatch
+ * An Iterator to walk through the InternalRows from a CachedBatch
  */
 abstract class ColumnarIterator extends Iterator[InternalRow] {
-  def initialize(input: Iterator[CachedBatch], mutableRow: MutableRow, 
columnTypes: Array[DataType],
+  def initialize(input: Iterator[CachedBatch], columnTypes: 
Array[DataType],
 columnIndexes: Array[Int]): Unit
 }
 
 /**
+ * An helper class to update the fields of UnsafeRow, used by 
ColumnAccessor
--- End diff --

should put a giant warning here that writes must be in increasing order of 
ordinals


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150038460
  
LGTM otherwise.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150044864
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44092/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150044863
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42692096
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -95,41 +99,75 @@ public void alignToWords(int numBytes) {
 }
   }
 
-  public void writeCompactDecimal(int ordinal, Decimal input, int 
precision, int scale) {
-// make sure Decimal object has the same scale as DecimalType
-if (input.changePrecision(precision, scale)) {
-  Platform.putLong(holder.buffer, getFieldOffset(ordinal), 
input.toUnscaledLong());
-} else {
-  setNullAt(ordinal);
-}
+  public void write(int ordinal, boolean value) {
+Platform.putBoolean(holder.buffer, getFieldOffset(ordinal), value);
   }
 
-  public void write(int ordinal, Decimal input, int precision, int scale) {
-// grow the global buffer before writing data.
-holder.grow(16);
+  public void write(int ordinal, byte value) {
+Platform.putByte(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-// zero-out the bytes
-Platform.putLong(holder.buffer, holder.cursor, 0L);
-Platform.putLong(holder.buffer, holder.cursor + 8, 0L);
+  public void write(int ordinal, short value) {
+Platform.putShort(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-// Make sure Decimal object has the same scale as DecimalType.
-// Note that we may pass in null Decimal object to set null for it.
-if (input == null || !input.changePrecision(precision, scale)) {
-  BitSetMethods.set(holder.buffer, startingOffset, ordinal);
-  // keep the offset for future update
-  setOffsetAndSize(ordinal, 0L);
-} else {
-  final byte[] bytes = 
input.toJavaBigDecimal().unscaledValue().toByteArray();
-  assert bytes.length <= 16;
+  public void write(int ordinal, int value) {
+Platform.putInt(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-  // Write the bytes to the variable length portion.
-  Platform.copyMemory(
-bytes, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, 
bytes.length);
-  setOffsetAndSize(ordinal, bytes.length);
+  public void write(int ordinal, long value) {
+Platform.putLong(holder.buffer, getFieldOffset(ordinal), value);
+  }
+
+  public void write(int ordinal, float value) {
+if (Float.isNaN(value)) {
+  value = Float.NaN;
 }
+Platform.putFloat(holder.buffer, getFieldOffset(ordinal), value);
+  }
 
-// move the cursor forward.
-holder.cursor += 16;
+  public void write(int ordinal, double value) {
+if (Double.isNaN(value)) {
+  value = Double.NaN;
+}
+Platform.putDouble(holder.buffer, getFieldOffset(ordinal), value);
+  }
+
+  public void write(int ordinal, Decimal input, int precision, int scale) {
+if (precision <= Decimal.MAX_LONG_DIGITS()) {
+  // make sure Decimal object has the same scale as DecimalType
+  if (input.changePrecision(precision, scale)) {
+Platform.putLong(holder.buffer, getFieldOffset(ordinal), 
input.toUnscaledLong());
+  } else {
+setNullAt(ordinal);
+  }
+} else {
+  // grow the global buffer before writing data.
+  holder.grow(16);
+
+  // zero-out the bytes
+  Platform.putLong(holder.buffer, holder.cursor, 0L);
+  Platform.putLong(holder.buffer, holder.cursor + 8, 0L);
+
+  // Make sure Decimal object has the same scale as DecimalType.
+  // Note that we may pass in null Decimal object to set null for it.
+  if (input == null || !input.changePrecision(precision, scale)) {
--- End diff --

Will do in separate PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42692044
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -58,6 +58,10 @@ private void zeroOutPaddingBytes(int numBytes) {
 }
   }
 
+  public boolean isNullAt(int ordinal) {
+return BitSetMethods.isSet(holder.buffer, startingOffset, ordinal);
--- End diff --

Yes, in UnsafeProjection, there could be multiple UnsafeRowWriter share a 
single BufferHolder.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150044370
  
@JoshRosen If we only scan the cached string and access it (without adding 
a ConvertToUnsafe), this could be a little bit slower, because we copy the 
bytes into UnsafeRow now. For the UTF8String object, we still will create it 
anyway, before this patch, we create it in ColumnAccessor, after this patch, we 
create in UnsafeRow.getUTF8String() (we could re-use that actually).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150044743
  
**[Test build #44092 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44092/consoleFull)**
 for PR 9203 at commit 
[`e33170d`](https://github.com/apache/spark/commit/e33170d793993bcfc48bdc0c159b3fe51b4f0861).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class MutableUnsafeRow(val writer: UnsafeRowWriter) extends 
GenericMutableRow(null) `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150075430
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44100/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150075426
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150075109
  
**[Test build #44100 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44100/consoleFull)**
 for PR 9203 at commit 
[`cab9286`](https://github.com/apache/spark/commit/cab9286b15bd784a023cd2b2c5c3ad755ebb566b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class MutableUnsafeRow(val writer: UnsafeRowWriter) extends 
GenericMutableRow(null) `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150084996
  
I'm going to merge this first. @davies please take a look at @cloud-fan 's 
comments, and address them in follow-up prs if necessary.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9203


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42700611
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -64,29 +63,72 @@ public void setOffset(int ordinal) {
 Platform.putInt(holder.buffer, getElementOffset(ordinal), 
relativeOffset);
   }
 
-  public void writeCompactDecimal(int ordinal, Decimal input, int 
precision, int scale) {
-// make sure Decimal object has the same scale as DecimalType
-if (input.changePrecision(precision, scale)) {
-  Platform.putLong(holder.buffer, holder.cursor, 
input.toUnscaledLong());
-  setOffset(ordinal);
-  holder.cursor += 8;
-} else {
-  setNullAt(ordinal);
+  public void write(int ordinal, boolean value) {
--- End diff --

Why implement these primitive write method in writer class not in codegen?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42701316
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -64,29 +63,72 @@ public void setOffset(int ordinal) {
 Platform.putInt(holder.buffer, getElementOffset(ordinal), 
relativeOffset);
   }
 
-  public void writeCompactDecimal(int ordinal, Decimal input, int 
precision, int scale) {
-// make sure Decimal object has the same scale as DecimalType
-if (input.changePrecision(precision, scale)) {
-  Platform.putLong(holder.buffer, holder.cursor, 
input.toUnscaledLong());
-  setOffset(ordinal);
-  holder.cursor += 8;
-} else {
-  setNullAt(ordinal);
+  public void write(int ordinal, boolean value) {
--- End diff --

Those in UnsafeRowWriter are needed for ColumnAccessor. The purpose of 
these *Writer is to simplify generated code (which is already hard to 
understand), these method also go in this direction. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42701494
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -64,29 +63,72 @@ public void setOffset(int ordinal) {
 Platform.putInt(holder.buffer, getElementOffset(ordinal), 
relativeOffset);
   }
 
-  public void writeCompactDecimal(int ordinal, Decimal input, int 
precision, int scale) {
-// make sure Decimal object has the same scale as DecimalType
-if (input.changePrecision(precision, scale)) {
-  Platform.putLong(holder.buffer, holder.cursor, 
input.toUnscaledLong());
-  setOffset(ordinal);
-  holder.cursor += 8;
-} else {
-  setNullAt(ordinal);
+  public void write(int ordinal, boolean value) {
--- End diff --

I think it is simpler to put them there rather than in generated code.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42701674
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -58,6 +58,10 @@ private void zeroOutPaddingBytes(int numBytes) {
 }
   }
 
+  public boolean isNullAt(int ordinal) {
--- End diff --

where do we use this method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42702161
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -95,41 +99,75 @@ public void alignToWords(int numBytes) {
 }
   }
 
-  public void writeCompactDecimal(int ordinal, Decimal input, int 
precision, int scale) {
-// make sure Decimal object has the same scale as DecimalType
-if (input.changePrecision(precision, scale)) {
-  Platform.putLong(holder.buffer, getFieldOffset(ordinal), 
input.toUnscaledLong());
-} else {
-  setNullAt(ordinal);
-}
+  public void write(int ordinal, boolean value) {
+Platform.putBoolean(holder.buffer, getFieldOffset(ordinal), value);
--- End diff --

I think we should zero-out the whole 8 bytes before writing.
For inner unsafe row, the start position of it may change during mulitly 
wirtes. So these 8 bytes are not always clean, if we only write a boolean here, 
the remaing 7 bytes maybe not zero, and will break the equality of unsafe row.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42702305
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -124,17 +124,10 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 """
 
   case _ if ctx.isPrimitiveType(dt) =>
-val fieldOffset = ctx.freshName("fieldOffset")
 s"""
-  final long $fieldOffset = $rowWriter.getFieldOffset($index);
-  Platform.putLong($bufferHolder.buffer, $fieldOffset, 0L);
-  ${writePrimitiveType(ctx, input.value, dt, 
s"$bufferHolder.buffer", fieldOffset)}
+  $rowWriter.write($index, ${input.value});
 """
 
-  case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
--- End diff --

I think a benefit of this is removing a if branch as we know the decimal 
type before generating code. Will this affect performance?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150070715
  
**[Test build #44098 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44098/consoleFull)**
 for PR 9203 at commit 
[`1421bce`](https://github.com/apache/spark/commit/1421bce528a460ecacaef69524cd28cfbc8a0057).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class MutableUnsafeRow(val writer: UnsafeRowWriter) extends 
GenericMutableRow(null) `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150070861
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44098/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150070859
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9203#discussion_r42703195
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -124,17 +124,10 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 """
 
   case _ if ctx.isPrimitiveType(dt) =>
-val fieldOffset = ctx.freshName("fieldOffset")
 s"""
-  final long $fieldOffset = $rowWriter.getFieldOffset($index);
-  Platform.putLong($bufferHolder.buffer, $fieldOffset, 0L);
-  ${writePrimitiveType(ctx, input.value, dt, 
s"$bufferHolder.buffer", fieldOffset)}
+  $rowWriter.write($index, ${input.value});
 """
 
-  case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
--- End diff --

it won't matter as much, since the branch predictor can mostly eliminate 
them.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150053054
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150053026
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150054848
  
**[Test build #44100 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44100/consoleFull)**
 for PR 9203 at commit 
[`cab9286`](https://github.com/apache/spark/commit/cab9286b15bd784a023cd2b2c5c3ad755ebb566b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150053373
  
**[Test build #44098 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44098/consoleFull)**
 for PR 9203 at commit 
[`1421bce`](https://github.com/apache/spark/commit/1421bce528a460ecacaef69524cd28cfbc8a0057).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150053787
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9203#issuecomment-150053767
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org