[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42711166 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -58,6 +58,10 @@ private void zeroOutPaddingBytes(int numBytes) { } } + public boolean isNullAt(int ordinal) { --- End diff -- This is needed by larger Decimal, to call write(decimal) after setNullAt() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42711131 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -95,41 +99,75 @@ public void alignToWords(int numBytes) { } } - public void writeCompactDecimal(int ordinal, Decimal input, int precision, int scale) { -// make sure Decimal object has the same scale as DecimalType -if (input.changePrecision(precision, scale)) { - Platform.putLong(holder.buffer, getFieldOffset(ordinal), input.toUnscaledLong()); -} else { - setNullAt(ordinal); -} + public void write(int ordinal, boolean value) { +Platform.putBoolean(holder.buffer, getFieldOffset(ordinal), value); --- End diff -- Thanks, will fix it later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-149998301 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-149998334 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-15251 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-15256 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44083/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-15244 **[Test build #44083 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44083/consoleFull)** for PR 9203 at commit [`15ebb3b`](https://github.com/apache/spark/commit/15ebb3b3794d6888e9e08a834511f66ec3934f00). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `abstract class ColumnarIterator extends Iterator[InternalRow] `\n * `class MutableUnsafeRow(val writer: UnsafeRowWriter) extends GenericMutableRow(null) `\n * ` class SpecificColumnarIterator extends $`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-14732 **[Test build #44083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44083/consoleFull)** for PR 9203 at commit [`15ebb3b`](https://github.com/apache/spark/commit/15ebb3b3794d6888e9e08a834511f66ec3934f00). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150016227 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/9203 [SPARK-11243] [SQL] output UnsafeRow from columnar cache You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark unsafe_cache Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9203.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9203 commit 15ebb3b3794d6888e9e08a834511f66ec3934f00 Author: Davies LiuDate: 2015-10-21T19:02:17Z output UnsafeRow from columnar cache --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150019732 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150019702 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150020994 **[Test build #44092 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44092/consoleFull)** for PR 9203 at commit [`e33170d`](https://github.com/apache/spark/commit/e33170d793993bcfc48bdc0c159b3fe51b4f0861). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42684787 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -95,41 +99,75 @@ public void alignToWords(int numBytes) { } } - public void writeCompactDecimal(int ordinal, Decimal input, int precision, int scale) { -// make sure Decimal object has the same scale as DecimalType -if (input.changePrecision(precision, scale)) { - Platform.putLong(holder.buffer, getFieldOffset(ordinal), input.toUnscaledLong()); -} else { - setNullAt(ordinal); -} + public void write(int ordinal, boolean value) { +Platform.putBoolean(holder.buffer, getFieldOffset(ordinal), value); } - public void write(int ordinal, Decimal input, int precision, int scale) { -// grow the global buffer before writing data. -holder.grow(16); + public void write(int ordinal, byte value) { +Platform.putByte(holder.buffer, getFieldOffset(ordinal), value); + } -// zero-out the bytes -Platform.putLong(holder.buffer, holder.cursor, 0L); -Platform.putLong(holder.buffer, holder.cursor + 8, 0L); + public void write(int ordinal, short value) { +Platform.putShort(holder.buffer, getFieldOffset(ordinal), value); + } -// Make sure Decimal object has the same scale as DecimalType. -// Note that we may pass in null Decimal object to set null for it. -if (input == null || !input.changePrecision(precision, scale)) { - BitSetMethods.set(holder.buffer, startingOffset, ordinal); - // keep the offset for future update - setOffsetAndSize(ordinal, 0L); -} else { - final byte[] bytes = input.toJavaBigDecimal().unscaledValue().toByteArray(); - assert bytes.length <= 16; + public void write(int ordinal, int value) { +Platform.putInt(holder.buffer, getFieldOffset(ordinal), value); + } - // Write the bytes to the variable length portion. - Platform.copyMemory( -bytes, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, bytes.length); - setOffsetAndSize(ordinal, bytes.length); + public void write(int ordinal, long value) { +Platform.putLong(holder.buffer, getFieldOffset(ordinal), value); + } + + public void write(int ordinal, float value) { +if (Float.isNaN(value)) { + value = Float.NaN; } +Platform.putFloat(holder.buffer, getFieldOffset(ordinal), value); + } -// move the cursor forward. -holder.cursor += 16; + public void write(int ordinal, double value) { +if (Double.isNaN(value)) { + value = Double.NaN; +} +Platform.putDouble(holder.buffer, getFieldOffset(ordinal), value); + } + + public void write(int ordinal, Decimal input, int precision, int scale) { +if (precision <= Decimal.MAX_LONG_DIGITS()) { + // make sure Decimal object has the same scale as DecimalType + if (input.changePrecision(precision, scale)) { --- End diff -- as discussed offline, maybe change this to assert? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42686848 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java --- @@ -618,6 +618,25 @@ public void writeTo(ByteBuffer buffer) { buffer.position(pos + sizeInBytes); } + /** + * Write the bytes of var-length field into ByteBuffer + */ + public void writeFieldTo(int ordinal, ByteBuffer buffer) { --- End diff -- should document this only supports HeapByteBuffer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150016965 **[Test build #44089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44089/consoleFull)** for PR 9203 at commit [`81e3fad`](https://github.com/apache/spark/commit/81e3fad9462eed6d7b32b20002bf1127dd435f18). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150016246 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42686721 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java --- @@ -618,6 +618,25 @@ public void writeTo(ByteBuffer buffer) { buffer.position(pos + sizeInBytes); } + /** + * Write the bytes of var-length field into ByteBuffer + */ + public void writeFieldTo(int ordinal, ByteBuffer buffer) { +final long offsetAndSize = getLong(ordinal); +final int offset = (int) (offsetAndSize >> 32); +final int size = (int) (offsetAndSize & ((1L << 32) - 1)); --- End diff -- can we create a constant for (1L << 32) - 1? We have a lot of them now in this file. LOWER_32_BIT_MASK? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150034498 I'm really excited to see how much of a performance difference this makes when scanning string columns, since this could potentially provide a big perf. boost to the sc.textFile replacement that I showed the other day. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42687247 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -58,6 +58,10 @@ private void zeroOutPaddingBytes(int numBytes) { } } + public boolean isNullAt(int ordinal) { +return BitSetMethods.isSet(holder.buffer, startingOffset, ordinal); --- End diff -- Do we ever start from the middle of an existing buffer? If not, can we move startingOffset into holder class? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150041762 **[Test build #44089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44089/consoleFull)** for PR 9203 at commit [`81e3fad`](https://github.com/apache/spark/commit/81e3fad9462eed6d7b32b20002bf1127dd435f18). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class MutableUnsafeRow(val writer: UnsafeRowWriter) extends GenericMutableRow(null) `\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150041882 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150041884 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44089/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42695693 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -95,41 +99,75 @@ public void alignToWords(int numBytes) { } } - public void writeCompactDecimal(int ordinal, Decimal input, int precision, int scale) { -// make sure Decimal object has the same scale as DecimalType -if (input.changePrecision(precision, scale)) { - Platform.putLong(holder.buffer, getFieldOffset(ordinal), input.toUnscaledLong()); -} else { - setNullAt(ordinal); -} + public void write(int ordinal, boolean value) { +Platform.putBoolean(holder.buffer, getFieldOffset(ordinal), value); } - public void write(int ordinal, Decimal input, int precision, int scale) { -// grow the global buffer before writing data. -holder.grow(16); + public void write(int ordinal, byte value) { +Platform.putByte(holder.buffer, getFieldOffset(ordinal), value); + } -// zero-out the bytes -Platform.putLong(holder.buffer, holder.cursor, 0L); -Platform.putLong(holder.buffer, holder.cursor + 8, 0L); + public void write(int ordinal, short value) { +Platform.putShort(holder.buffer, getFieldOffset(ordinal), value); + } -// Make sure Decimal object has the same scale as DecimalType. -// Note that we may pass in null Decimal object to set null for it. -if (input == null || !input.changePrecision(precision, scale)) { - BitSetMethods.set(holder.buffer, startingOffset, ordinal); - // keep the offset for future update - setOffsetAndSize(ordinal, 0L); -} else { - final byte[] bytes = input.toJavaBigDecimal().unscaledValue().toByteArray(); - assert bytes.length <= 16; + public void write(int ordinal, int value) { +Platform.putInt(holder.buffer, getFieldOffset(ordinal), value); + } - // Write the bytes to the variable length portion. - Platform.copyMemory( -bytes, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, bytes.length); - setOffsetAndSize(ordinal, bytes.length); + public void write(int ordinal, long value) { +Platform.putLong(holder.buffer, getFieldOffset(ordinal), value); + } + + public void write(int ordinal, float value) { +if (Float.isNaN(value)) { + value = Float.NaN; } +Platform.putFloat(holder.buffer, getFieldOffset(ordinal), value); + } -// move the cursor forward. -holder.cursor += 16; + public void write(int ordinal, double value) { +if (Double.isNaN(value)) { + value = Double.NaN; +} +Platform.putDouble(holder.buffer, getFieldOffset(ordinal), value); + } + + public void write(int ordinal, Decimal input, int precision, int scale) { +if (precision <= Decimal.MAX_LONG_DIGITS()) { + // make sure Decimal object has the same scale as DecimalType + if (input.changePrecision(precision, scale)) { --- End diff -- will do in separate PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42695679 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java --- @@ -618,6 +618,25 @@ public void writeTo(ByteBuffer buffer) { buffer.position(pos + sizeInBytes); } + /** + * Write the bytes of var-length field into ByteBuffer + */ + public void writeFieldTo(int ordinal, ByteBuffer buffer) { +final long offsetAndSize = getLong(ordinal); +final int offset = (int) (offsetAndSize >> 32); +final int size = (int) (offsetAndSize & ((1L << 32) - 1)); --- End diff -- Actually, there are not needed, `(int) offsetAndSize` should be enough --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42687468 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -95,41 +99,75 @@ public void alignToWords(int numBytes) { } } - public void writeCompactDecimal(int ordinal, Decimal input, int precision, int scale) { -// make sure Decimal object has the same scale as DecimalType -if (input.changePrecision(precision, scale)) { - Platform.putLong(holder.buffer, getFieldOffset(ordinal), input.toUnscaledLong()); -} else { - setNullAt(ordinal); -} + public void write(int ordinal, boolean value) { +Platform.putBoolean(holder.buffer, getFieldOffset(ordinal), value); } - public void write(int ordinal, Decimal input, int precision, int scale) { -// grow the global buffer before writing data. -holder.grow(16); + public void write(int ordinal, byte value) { +Platform.putByte(holder.buffer, getFieldOffset(ordinal), value); + } -// zero-out the bytes -Platform.putLong(holder.buffer, holder.cursor, 0L); -Platform.putLong(holder.buffer, holder.cursor + 8, 0L); + public void write(int ordinal, short value) { +Platform.putShort(holder.buffer, getFieldOffset(ordinal), value); + } -// Make sure Decimal object has the same scale as DecimalType. -// Note that we may pass in null Decimal object to set null for it. -if (input == null || !input.changePrecision(precision, scale)) { - BitSetMethods.set(holder.buffer, startingOffset, ordinal); - // keep the offset for future update - setOffsetAndSize(ordinal, 0L); -} else { - final byte[] bytes = input.toJavaBigDecimal().unscaledValue().toByteArray(); - assert bytes.length <= 16; + public void write(int ordinal, int value) { +Platform.putInt(holder.buffer, getFieldOffset(ordinal), value); + } - // Write the bytes to the variable length portion. - Platform.copyMemory( -bytes, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, bytes.length); - setOffsetAndSize(ordinal, bytes.length); + public void write(int ordinal, long value) { +Platform.putLong(holder.buffer, getFieldOffset(ordinal), value); + } + + public void write(int ordinal, float value) { +if (Float.isNaN(value)) { + value = Float.NaN; } +Platform.putFloat(holder.buffer, getFieldOffset(ordinal), value); + } -// move the cursor forward. -holder.cursor += 16; + public void write(int ordinal, double value) { +if (Double.isNaN(value)) { + value = Double.NaN; +} +Platform.putDouble(holder.buffer, getFieldOffset(ordinal), value); + } + + public void write(int ordinal, Decimal input, int precision, int scale) { +if (precision <= Decimal.MAX_LONG_DIGITS()) { + // make sure Decimal object has the same scale as DecimalType + if (input.changePrecision(precision, scale)) { +Platform.putLong(holder.buffer, getFieldOffset(ordinal), input.toUnscaledLong()); + } else { +setNullAt(ordinal); + } +} else { + // grow the global buffer before writing data. + holder.grow(16); + + // zero-out the bytes + Platform.putLong(holder.buffer, holder.cursor, 0L); + Platform.putLong(holder.buffer, holder.cursor + 8, 0L); + + // Make sure Decimal object has the same scale as DecimalType. + // Note that we may pass in null Decimal object to set null for it. + if (input == null || !input.changePrecision(precision, scale)) { --- End diff -- assert the 2nd part? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42688421 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/columnar/GenerateColumnAccessor.scala --- @@ -20,18 +20,42 @@ package org.apache.spark.sql.columnar import org.apache.spark.Logging import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions._ -import org.apache.spark.sql.catalyst.expressions.codegen.{CodeFormatter, CodeGenerator} +import org.apache.spark.sql.catalyst.expressions.codegen.{UnsafeRowWriter, CodeFormatter, CodeGenerator} import org.apache.spark.sql.types._ /** - * An Iterator to walk throught the InternalRows from a CachedBatch + * An Iterator to walk through the InternalRows from a CachedBatch */ abstract class ColumnarIterator extends Iterator[InternalRow] { - def initialize(input: Iterator[CachedBatch], mutableRow: MutableRow, columnTypes: Array[DataType], + def initialize(input: Iterator[CachedBatch], columnTypes: Array[DataType], columnIndexes: Array[Int]): Unit } /** + * An helper class to update the fields of UnsafeRow, used by ColumnAccessor --- End diff -- should put a giant warning here that writes must be in increasing order of ordinals --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150038460 LGTM otherwise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150044864 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44092/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150044863 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42692096 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -95,41 +99,75 @@ public void alignToWords(int numBytes) { } } - public void writeCompactDecimal(int ordinal, Decimal input, int precision, int scale) { -// make sure Decimal object has the same scale as DecimalType -if (input.changePrecision(precision, scale)) { - Platform.putLong(holder.buffer, getFieldOffset(ordinal), input.toUnscaledLong()); -} else { - setNullAt(ordinal); -} + public void write(int ordinal, boolean value) { +Platform.putBoolean(holder.buffer, getFieldOffset(ordinal), value); } - public void write(int ordinal, Decimal input, int precision, int scale) { -// grow the global buffer before writing data. -holder.grow(16); + public void write(int ordinal, byte value) { +Platform.putByte(holder.buffer, getFieldOffset(ordinal), value); + } -// zero-out the bytes -Platform.putLong(holder.buffer, holder.cursor, 0L); -Platform.putLong(holder.buffer, holder.cursor + 8, 0L); + public void write(int ordinal, short value) { +Platform.putShort(holder.buffer, getFieldOffset(ordinal), value); + } -// Make sure Decimal object has the same scale as DecimalType. -// Note that we may pass in null Decimal object to set null for it. -if (input == null || !input.changePrecision(precision, scale)) { - BitSetMethods.set(holder.buffer, startingOffset, ordinal); - // keep the offset for future update - setOffsetAndSize(ordinal, 0L); -} else { - final byte[] bytes = input.toJavaBigDecimal().unscaledValue().toByteArray(); - assert bytes.length <= 16; + public void write(int ordinal, int value) { +Platform.putInt(holder.buffer, getFieldOffset(ordinal), value); + } - // Write the bytes to the variable length portion. - Platform.copyMemory( -bytes, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, bytes.length); - setOffsetAndSize(ordinal, bytes.length); + public void write(int ordinal, long value) { +Platform.putLong(holder.buffer, getFieldOffset(ordinal), value); + } + + public void write(int ordinal, float value) { +if (Float.isNaN(value)) { + value = Float.NaN; } +Platform.putFloat(holder.buffer, getFieldOffset(ordinal), value); + } -// move the cursor forward. -holder.cursor += 16; + public void write(int ordinal, double value) { +if (Double.isNaN(value)) { + value = Double.NaN; +} +Platform.putDouble(holder.buffer, getFieldOffset(ordinal), value); + } + + public void write(int ordinal, Decimal input, int precision, int scale) { +if (precision <= Decimal.MAX_LONG_DIGITS()) { + // make sure Decimal object has the same scale as DecimalType + if (input.changePrecision(precision, scale)) { +Platform.putLong(holder.buffer, getFieldOffset(ordinal), input.toUnscaledLong()); + } else { +setNullAt(ordinal); + } +} else { + // grow the global buffer before writing data. + holder.grow(16); + + // zero-out the bytes + Platform.putLong(holder.buffer, holder.cursor, 0L); + Platform.putLong(holder.buffer, holder.cursor + 8, 0L); + + // Make sure Decimal object has the same scale as DecimalType. + // Note that we may pass in null Decimal object to set null for it. + if (input == null || !input.changePrecision(precision, scale)) { --- End diff -- Will do in separate PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42692044 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -58,6 +58,10 @@ private void zeroOutPaddingBytes(int numBytes) { } } + public boolean isNullAt(int ordinal) { +return BitSetMethods.isSet(holder.buffer, startingOffset, ordinal); --- End diff -- Yes, in UnsafeProjection, there could be multiple UnsafeRowWriter share a single BufferHolder. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150044370 @JoshRosen If we only scan the cached string and access it (without adding a ConvertToUnsafe), this could be a little bit slower, because we copy the bytes into UnsafeRow now. For the UTF8String object, we still will create it anyway, before this patch, we create it in ColumnAccessor, after this patch, we create in UnsafeRow.getUTF8String() (we could re-use that actually). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150044743 **[Test build #44092 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44092/consoleFull)** for PR 9203 at commit [`e33170d`](https://github.com/apache/spark/commit/e33170d793993bcfc48bdc0c159b3fe51b4f0861). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class MutableUnsafeRow(val writer: UnsafeRowWriter) extends GenericMutableRow(null) `\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150075430 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44100/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150075426 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150075109 **[Test build #44100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44100/consoleFull)** for PR 9203 at commit [`cab9286`](https://github.com/apache/spark/commit/cab9286b15bd784a023cd2b2c5c3ad755ebb566b). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class MutableUnsafeRow(val writer: UnsafeRowWriter) extends GenericMutableRow(null) `\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150084996 I'm going to merge this first. @davies please take a look at @cloud-fan 's comments, and address them in follow-up prs if necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9203 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42700611 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java --- @@ -64,29 +63,72 @@ public void setOffset(int ordinal) { Platform.putInt(holder.buffer, getElementOffset(ordinal), relativeOffset); } - public void writeCompactDecimal(int ordinal, Decimal input, int precision, int scale) { -// make sure Decimal object has the same scale as DecimalType -if (input.changePrecision(precision, scale)) { - Platform.putLong(holder.buffer, holder.cursor, input.toUnscaledLong()); - setOffset(ordinal); - holder.cursor += 8; -} else { - setNullAt(ordinal); + public void write(int ordinal, boolean value) { --- End diff -- Why implement these primitive write method in writer class not in codegen? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42701316 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java --- @@ -64,29 +63,72 @@ public void setOffset(int ordinal) { Platform.putInt(holder.buffer, getElementOffset(ordinal), relativeOffset); } - public void writeCompactDecimal(int ordinal, Decimal input, int precision, int scale) { -// make sure Decimal object has the same scale as DecimalType -if (input.changePrecision(precision, scale)) { - Platform.putLong(holder.buffer, holder.cursor, input.toUnscaledLong()); - setOffset(ordinal); - holder.cursor += 8; -} else { - setNullAt(ordinal); + public void write(int ordinal, boolean value) { --- End diff -- Those in UnsafeRowWriter are needed for ColumnAccessor. The purpose of these *Writer is to simplify generated code (which is already hard to understand), these method also go in this direction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42701494 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java --- @@ -64,29 +63,72 @@ public void setOffset(int ordinal) { Platform.putInt(holder.buffer, getElementOffset(ordinal), relativeOffset); } - public void writeCompactDecimal(int ordinal, Decimal input, int precision, int scale) { -// make sure Decimal object has the same scale as DecimalType -if (input.changePrecision(precision, scale)) { - Platform.putLong(holder.buffer, holder.cursor, input.toUnscaledLong()); - setOffset(ordinal); - holder.cursor += 8; -} else { - setNullAt(ordinal); + public void write(int ordinal, boolean value) { --- End diff -- I think it is simpler to put them there rather than in generated code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42701674 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -58,6 +58,10 @@ private void zeroOutPaddingBytes(int numBytes) { } } + public boolean isNullAt(int ordinal) { --- End diff -- where do we use this method? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42702161 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -95,41 +99,75 @@ public void alignToWords(int numBytes) { } } - public void writeCompactDecimal(int ordinal, Decimal input, int precision, int scale) { -// make sure Decimal object has the same scale as DecimalType -if (input.changePrecision(precision, scale)) { - Platform.putLong(holder.buffer, getFieldOffset(ordinal), input.toUnscaledLong()); -} else { - setNullAt(ordinal); -} + public void write(int ordinal, boolean value) { +Platform.putBoolean(holder.buffer, getFieldOffset(ordinal), value); --- End diff -- I think we should zero-out the whole 8 bytes before writing. For inner unsafe row, the start position of it may change during mulitly wirtes. So these 8 bytes are not always clean, if we only write a boolean here, the remaing 7 bytes maybe not zero, and will break the equality of unsafe row. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42702305 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -124,17 +124,10 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro """ case _ if ctx.isPrimitiveType(dt) => -val fieldOffset = ctx.freshName("fieldOffset") s""" - final long $fieldOffset = $rowWriter.getFieldOffset($index); - Platform.putLong($bufferHolder.buffer, $fieldOffset, 0L); - ${writePrimitiveType(ctx, input.value, dt, s"$bufferHolder.buffer", fieldOffset)} + $rowWriter.write($index, ${input.value}); """ - case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => --- End diff -- I think a benefit of this is removing a if branch as we know the decimal type before generating code. Will this affect performance? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150070715 **[Test build #44098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44098/consoleFull)** for PR 9203 at commit [`1421bce`](https://github.com/apache/spark/commit/1421bce528a460ecacaef69524cd28cfbc8a0057). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class MutableUnsafeRow(val writer: UnsafeRowWriter) extends GenericMutableRow(null) `\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150070861 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44098/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150070859 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9203#discussion_r42703195 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -124,17 +124,10 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro """ case _ if ctx.isPrimitiveType(dt) => -val fieldOffset = ctx.freshName("fieldOffset") s""" - final long $fieldOffset = $rowWriter.getFieldOffset($index); - Platform.putLong($bufferHolder.buffer, $fieldOffset, 0L); - ${writePrimitiveType(ctx, input.value, dt, s"$bufferHolder.buffer", fieldOffset)} + $rowWriter.write($index, ${input.value}); """ - case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => --- End diff -- it won't matter as much, since the branch predictor can mostly eliminate them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150053054 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150053026 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150054848 **[Test build #44100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44100/consoleFull)** for PR 9203 at commit [`cab9286`](https://github.com/apache/spark/commit/cab9286b15bd784a023cd2b2c5c3ad755ebb566b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150053373 **[Test build #44098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44098/consoleFull)** for PR 9203 at commit [`1421bce`](https://github.com/apache/spark/commit/1421bce528a460ecacaef69524cd28cfbc8a0057). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150053787 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11243] [SQL] output UnsafeRow from colu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9203#issuecomment-150053767 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org