[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

2017-03-03 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/17039
  
ok there were a couple of similar issues such as in-set-operations query 9, 
group-analytics.sql.out query 21 and 22


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

2017-03-03 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/17039
  
ok so here is an example of output I'm not sure is correct:

in-order-by

-- !query 17
SELECT Count(DISTINCT( t1a )),
   t1b
FROM   t1
WHERE  t1h NOT IN (SELECT t2h
   FROM   t2
   where t1a = t2a
   order by t2d DESC nulls first
   )
GROUP  BY t1a,
  t1b
ORDER  BY t1b DESC nulls last
-- !query 17 schema
struct<count(DISTINCT t1a):bigint,t1b:smallint>
-- !query 17 output
1   10
1   10
1   16
1   6
1   8
1   NULL

That is the "new" output with your change but it doesn't actually match 
what you'd expect from that query (it isn't t1b DESC) which would be

1   16
1   10
1   10
1   8
1   6
1   NULL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

2017-03-01 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/17039
  
@hvanhovell So I backed out the changes in this PR, implemented your change 
to SQLQueryTestSuite.getNormalizedResult, regenerated the golden results files 
and the tests all pass on my x86 and big endian platforms.

results files that were changed:
sql/core/src/test/resources/sql-tests/results/group-analytics.sql.out

sql/core/src/test/resources/sql-tests/results/order-by-nulls-ordering.sql.out

sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-joins.sql.out

sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-order-by.sql.out

sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out

sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-joins.sql.out


So, should I abandon this PR and go with your solution? I can submit your 
change in a PR along with updated results files if you want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

2017-02-28 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/17039
  
I think that the current "order if not currently ordered" in the test suite 
is good for checking the set of results for unordered queries.

If ordered at all then the results should be deterministic given the input 
data and query are part of the test otherwise it is a bad test. So... I think 
this PR is the way to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

2017-02-27 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/17039
  
Jenkins retest please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

2017-02-27 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/17039
  
@gatorsmile  I'm glad it wasn't just me that found it complex ;-)

I've modified the patch to remove an unnecessary change as that query was 
not ordered and the test suite code handles that case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

2017-02-24 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/17039
  
@hvanhovell @gatorsmile  I agree that would be a better solution however I 
don't know how to achieve that being unfamiliar with this code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17039: [SPARK-19710] Fix ordering of rows in query resul...

2017-02-23 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/17039

[SPARK-19710] Fix ordering of rows in query results

## What changes were proposed in this pull request?
Changes to SQLQueryTests to make the order of the results constant.
Where possible ORDER BY has been added to match the existing expected output

## How was this patch tested?
Test runs on x86, zLinux (big endian), ppc (big endian)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 SPARK-19710

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17039.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17039


commit fbc46a6f5ec2a4aaf3c6b4d5d776ccc7b114842f
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-12-21T10:06:41Z

o.a.s.unsafe.types.UTF8StringSuite.writeToOutputStreamIntArray test
fails on big endian. Only change byte order on little endian

commit 30e20be2c199cc57a6a85547770dfa6fc3d32752
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-12-22T09:14:14Z

Simplify setting  of byte order

commit 145c76a2ce4b53726c209f04e0a230692b395369
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-12-22T16:08:38Z

Merge branch 'master' of https://github.com/apache/spark.git

commit f0e77f29f1dca2198a87efa28fb01fd247162ceb
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-12-23T08:55:15Z

Merge branch 'master' of https://github.com/apache/spark.git

commit 1bc1adf48dae6b1047ff4d4e3d467ffec88abe12
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-12-23T08:56:42Z

remove redundant comment

commit ea259fc7a00b3aab1dd554ec9850d057407b1875
Author: Pete Robbins <robbin...@gmail.com>
Date:   2017-01-03T10:29:01Z

Merge branch 'master' of https://github.com/apache/spark.git

commit f4b76a779df4c9b952114a59907a27a2eddd9898
Author: Pete Robbins <robbin...@gmail.com>
Date:   2017-02-03T13:21:41Z

Merge branch 'master' of https://github.com/apache/spark.git

commit b5571ea47408f2027a73368adf763b0f0d60eba0
Author: Pete Robbins <robbin...@gmail.com>
Date:   2017-02-13T13:13:05Z

Merge branch 'master' of https://github.com/apache/spark.git

commit 191777387b4b76afd8442ddc9b33815bd487dfe6
Author: Pete Robbins <robbin...@gmail.com>
Date:   2017-02-14T10:57:47Z

Merge branch 'master' of https://github.com/apache/spark.git

commit a832b740a5e29b25e8e31cd5de73a311f855a12d
Author: Pete Robbins <robbin...@gmail.com>
Date:   2017-02-20T13:18:05Z

Merge branch 'master' of https://github.com/apache/spark.git

commit bafe31ccbefdc80d434e34baae8600cb6ef26c56
Author: Pete Robbins <robbin...@gmail.com>
Date:   2017-02-23T11:29:32Z

Merge branch 'master' of https://github.com/apache/spark.git

commit 950415f98c4574532da9dd089e5a9b027d3683d8
Author: Pete Robbins <robbin...@gmail.com>
Date:   2017-02-23T11:33:53Z

Update tests to produce reliably ordered results




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN s...

2017-02-23 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/16841
  
OK I'll raise a separate Jira, document the differences and submit a PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN s...

2017-02-22 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/16841
  
@kevinyu98 Several of the new tests fail on Big Endian platforms. It 
appears that rows are returned in a slightly different order but are still a 
correct output from the query. For example in-joins query 4:


-- !query 4
SELECTCount(DISTINCT(t1a)),
  t1b,
  t3a,
  t3b,
  t3c
FROM  t1 natural left JOIN t3
WHERE t1a IN
  (
 SELECT t2a
 FROM   t2
 WHERE t1d = t2d)
AND   t1b > t3b
GROUP BY  t1a,
  t1b,
  t3a,
  t3b,
  t3c
ORDER BY  t1a DESC


on Little Endian returns
1   10  val3b   8   NULL
1   10  val1b   8   16
1   10  val3a   6   12
1   8   val3a   6   12
1   8   val3a   6   12

wheras on big endian returns:
1   10  val3a   6   12
1   10  val3b   8   NULL
1   10  val1b   8   16
1   8   val3a   6   12
1   8   val3a   6   12

I believe GROUP BY does not define any ordering so both of these outputs 
are valid  for the query  as the ORDER BY is only on t1a but obviously the big 
endian output does not match your expected output so fails.

I'm trying to determine why the execution on big endian returns the rows in 
a different order.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16795: [SPARK-19409][BUILD] Fix ParquetAvroCompatibility...

2017-02-06 Thread robbinspg
Github user robbinspg commented on a diff in the pull request:

https://github.com/apache/spark/pull/16795#discussion_r99622258
  
--- Diff: sql/core/pom.xml ---
@@ -130,6 +130,12 @@
   test
 
 
+  org.apache.avro
--- End diff --

Is this issue only a test dependency? I see avro.version declared in the 
top level pom.xml


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-02-06 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/16751
  
Sorry, I've been away for the w/end. Yes we use maven for our test runs. 
Looks like you have it under control.
Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-02-03 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/16751
  
Since this commit our test runs are failing with
ParquetAvroCompatibilitySuite:

*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: org/apache/avro/LogicalType
  at 
org.apache.parquet.avro.AvroParquetWriter.writeSupport(AvroParquetWriter.java:144)
  at 
org.apache.parquet.avro.AvroParquetWriter.access$100(AvroParquetWriter.java:35)
  at 
org.apache.parquet.avro.AvroParquetWriter$Builder.getWriteSupport(AvroParquetWriter.java:173)

Does the avro.version also need to be bumped to 1.8.x?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16375: [SPARK-18963] o.a.s.unsafe.types.UTF8StringSuite.writeTo...

2016-12-22 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/16375
  
Test run is failing with an unrelated error


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16375: [SPARK-18963] o.a.s.unsafe.types.UTF8StringSuite.writeTo...

2016-12-22 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/16375
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16375: [SPARK-18963] o.a.s.unsafe.types.UTF8StringSuite....

2016-12-22 Thread robbinspg
Github user robbinspg commented on a diff in the pull request:

https://github.com/apache/spark/pull/16375#discussion_r93587770
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java 
---
@@ -591,7 +591,11 @@ public void writeToOutputStreamIntArray() throws 
IOException {
 // verify that writes work on objects that are not byte arrays
 final ByteBuffer buffer = 
StandardCharsets.UTF_8.encode("大千世界");
 buffer.position(0);
-buffer.order(ByteOrder.LITTLE_ENDIAN);
+
--- End diff --

Ok I'll change it to that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16375: [SPRK-18963] o.a.s.unsafe.types.UTF8StringSuite.w...

2016-12-21 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/16375

[SPRK-18963] o.a.s.unsafe.types.UTF8StringSuite.writeToOutputStreamIntArray 
test

fails on big endian. Only change byte order on little endian

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 SPARK-18963

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16375.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16375


commit fbc46a6f5ec2a4aaf3c6b4d5d776ccc7b114842f
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-12-21T10:06:41Z

o.a.s.unsafe.types.UTF8StringSuite.writeToOutputStreamIntArray test
fails on big endian. Only change byte order on little endian




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-14 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/15307
  
This PR seems to cause intermittent test failures eg: 
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/1736/testReport/junit/org.apache.spark.sql.streaming/StreamingQueryListenerSuite/single_listener__check_trigger_statuses/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15464: [SPARK-17827][SQL]maxColLength type should be Int for St...

2016-10-13 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/15464
  
Tests all pass on big-endian with this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15464: [SPARK-17827][SQL]maxColLength type should be Int for St...

2016-10-13 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/15464
  
This PR contains a change to  o.a.s.sql.hive.StatisticsSuite which I 
believe should fix that issue (awaiting big-endian build to complete)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15464: [SPARK-17827][SQL]maxColLength type should be Int...

2016-10-13 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/15464

[SPARK-17827][SQL]maxColLength type should be Int for String and Binary

## What changes were proposed in this pull request?
correct the expected type from Length function to be Int

## How was this patch tested?
Test runs on little endian and big endian platforms


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 SPARK-17827

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15464.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15464


commit 559c3b905bb4a95e880051a7066438705bb1ecfd
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-10-13T12:58:40Z

Max length returns an Int for String and binary




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...

2016-06-20 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/13652
  
Also failing here in the UK:
{noformat}
- to UTC timestamp *** FAILED ***
  "2016-03-13 [02]:00:00.0" did not equal "2016-03-13 [10]:00:00.0" 
(DateTimeUtilsSuite.scala:506)
{noformat}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13707: [WIP][SPARK-15822][SQL] avoid UTF8String references into...

2016-06-16 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/13707
  
So clearly that code doesn't work when the type is a primitive. I'm not 
familiar with the code generation. Is there a way to detect the type during 
generation rather than generating the dodgy "isinstanceof" code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13707: [SPARK-15822][SQL] avoid UTF8String references into free...

2016-06-16 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/13707
  
@davies @hvanhovell Can you take a look at this. I'm not sure it is the 
best fix. Also are there any other types (structs, arrays etc) that are created 
by pointing into an UnsafeRow that could equally be  affected.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13707: [SPARK-15822][SQL] avoid UTF8String references in...

2016-06-16 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/13707

[SPARK-15822][SQL] avoid UTF8String references into freed pages

## What changes were proposed in this pull request?

In SMJ codegen we need to save copies of UTF8String values as the final 
iterator.next() will free the underlying memory page.


## How was this patch tested?

Test application as described in 15822 now passes.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13707.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13707


commit d201034e628456b9640eb8483da849217aa80c92
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-06-16T14:33:31Z

Create copy of UTF8String in SMJ

commit 1288f1e88f90b751cd0640864d7cc6cb5a9dfeca
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-06-16T15:21:23Z

make copy() public




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13589: [SPARK-15822][SPARK-15825][SQL] Fix SMJ Segfault/Invalid...

2016-06-10 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/13589
  
As Adam says I still get the segv with OpenJDK on linux amd64 running our 
app. This fix does appear to fix the issue reported in 
https://issues.apache.org/jira/browse/SPARK-15825


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13355: [SPARK-15606][core] Use non-blocking removeExecutor call...

2016-06-02 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/13355
  
@zsxwing ok to merge now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13355: [SPARK-15606][core] Use non-blocking removeExecutor call...

2016-06-01 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/13355
  
Test suite removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15606][core] Use a minimum of 3 dispatcher thread...

2016-06-01 Thread robbinspg
Github user robbinspg commented on a diff in the pull request:

https://github.com/apache/spark/pull/13355#discussion_r65332472
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala ---
@@ -38,7 +38,8 @@ class BlockManagerMaster(
 
   /** Remove a dead executor from the driver endpoint. This is only called 
on the driver side. */
   def removeExecutor(execId: String) {
-tell(RemoveExecutor(execId))
--- End diff --

OK so I've added a new removeExecutorAsync method to minimise side effects


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15606][core] Use a minimum of 3 dispatcher thread...

2016-05-31 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/13355
  
reverted original fix and replaced with using non-blocking call in 
BlockManagerMaster.removeExecutor. 

Also added a new test suite to run Distributed suite forcing the number of 
dispatcher threads to 2. This suite will fail without the fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15606][core] Use a minimum of 3 dispatcher thread...

2016-05-31 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/13355
  
OK, that's what I tried but it threw up some errors in some other tests 
which I'm investigating.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15606][core] Use a minimum of 3 dispatcher thread...

2016-05-31 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/13355
  
@zsxwing  Do you mean change BlockManagerMaster.removeExecutor to send the 
message using send (fire and forget) rather than askWithRetry? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15606][core] Use a minimum of 3 dispatc...

2016-05-27 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/13355#issuecomment-09844
  
agreed. I'll take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15606][core] Use a minimum of 3 dispatc...

2016-05-27 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/13355#issuecomment-222104521
  
Although this patch resolves this particular issue I would echo the comment 
in https://github.com/apache/spark/pull/11728 by @zsxwing 

{quote}
However, the root cause is there are blocking calls in the event loops but 
not enough threads. This could happen in other places (such as netty, akka). 
Ideally, we should avoid blocking calls in all event loops. However, it's hard 
to figure out all of them in the huge code bases :(
{quote}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Use a minimum of 3 dispatcher threads to avoid...

2016-05-27 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/13355

Use a minimum of 3 dispatcher threads to avoid deadlocks

## What changes were proposed in this pull request?
Set minimum number of dispatcher threads to 3 to avoid deadlocks on 
machines with only 2 cores


## How was this patch tested?

Spark test builds



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 SPARK-13906

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13355.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13355


commit 139e87558d728c5ae4ccf297c1702a73d5573335
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-05-27T09:32:49Z

Use a minimum of 3 dispatcher threads to avoid deadlocks




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15154][SQL] Change key types to Long in...

2016-05-10 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/13009#issuecomment-218105338
  
should I add an assert into the LongHashedRelation.apply to validate the 
key and a test to cover this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15154][SQL] Change key types to Long in...

2016-05-09 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/13009

[Spark-15154][SQL] Change key types to Long in tests

## What changes were proposed in this pull request?

As reported in the Jira the 2 tests changed here are using a key of type 
Integer where the Spark sql code assumes the type is Long. This PR changes the 
tests to use the correct key types.


## How was this patch tested?

Test builds run on both Big Endian and Little Endian platforms
(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 HashedRelationSuiteFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13009.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13009


commit 96a0dff9e08727d6885f2c5b8c30ec1281714ce6
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-05-05T12:19:31Z

Change key types to Long in tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-05-02 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-216360442
  
Many thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-29 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-215690156
  
Sorry to keep bugging you on this but I'd really like to fix this major 
issue and move on. If there are no objections to merging this into master could 
a committer please do the honours. I don't want to have to create and maintain 
a Big Endian fork if possible.

Cheers


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-27 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-214981491
  
@rxin @hvanhovell is there anything preventing this being merged? IMHO the 
jira it is fixing is a blocking defect


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-26 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-214629761
  
@hvanhovell Spark 1.6.1 is fine on BE. The issues have been with new 
function added for Spark 2.0. This PR fixes the major issue. There are a few 
other issues which need investigating which may be flaky tests but this is the 
most important.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-25 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-214429928
  
@hvanhovell Can me merge this now?

I agree the benchmarks should run after a steady state is achieved. Also 
I'll probably create a change to allow the Benchmark to output in csv format!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-22 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-213670252
  
Can we re test this as I think there was a minor change since the test build


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-22 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12501#issuecomment-213670198
  
closing this in favour of other implementation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-22 Thread robbinspg
Github user robbinspg closed the pull request at:

https://github.com/apache/spark/pull/12501


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-22 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-213482568
  
@rxin Do you think we can merge in this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14848][SQL] Compare as Set in DatasetSu...

2016-04-22 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/12610

[SPARK-14848][SQL] Compare as Set in DatasetSuite - Java encoder

## What changes were proposed in this pull request?
Change test to compare sets rather than sequence


## How was this patch tested?
Full test runs on little endian and big endian platforms




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 DatasetSuiteFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12610.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12610


commit 9203f72155aa5fe4e7ebba158591d6961035371a
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-04-22T12:19:15Z

Compare as Set in DatasetSuite - Java encoder




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-21 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-212930732
  
@hvanhovell Here are the test results running 10x the size:

[ParquetReadBenchmarks.txt](https://github.com/apache/spark/files/230027/ParquetReadBenchmarks.txt)

I'm not sure there is a lot in it and if it were up to me I'd go with the 
implementation in this PR rather than the subclassing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-21 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-212789051
  
@hvanhovell Any thoughts/interpretations on those benchmark results? I 
think the differences are all within the bounds of randomness!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-20 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-212404982
  

[ParquetReadBenchmark-PartitionedTable.txt](https://github.com/apache/spark/files/227908/ParquetReadBenchmark-PartitionedTable.txt)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-20 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-212388759
  
Averaged results for 5 runs for first 3 benchmarks:


[ParquetReadBenchmark.txt](https://github.com/apache/spark/files/227827/ParquetReadBenchmark.txt)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-20 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-212358250
  
@hvanhovell  Yes I will. I'm trying to get a stable base benchmark first as 
running the ParquetReadBenchmark repeatedly against the base code (before 
either PR) I get what looks like a 10% variation in results. The same is true 
with either of the PRs applied so I will average out the runs.

As far as subclassing goes I would expect performance on LE to remain the 
same as the code path should be identical once the classes are instantiated. 
Performance on BE between the 2 approaches is another thing but not the major 
concern at the moment as we are going from failing/exceptions thrown to 
"working"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-19 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-212024789
  
Alternative implementation in https://github.com/apache/spark/pull/12501


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-19 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/12501

[SPARK-13745][SQL]Support columnar in memory representation on Big Endian 
platforms - implent by subclassing

## What changes were proposed in this pull request?

An alternative implementation of https://github.com/apache/spark/pull/12397 
which uses subclasses to minimize any potential performance hits on Little 
Endian 

parquet datasource and ColumnarBatch tests fail on big-endian platforms 
This patch adds support for the little-endian byte arrays being correctly 
interpreted on a big-endian platform


## How was this patch tested?

Spark test builds ran on big endian z/Linux and regression build on little 
endian amd64



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 bigEndianViaSubclass

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12501.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12501


commit 1fc048385fb0fea93eef85f614586448a3ea7c2a
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-04-14T13:50:34Z

Support columnar in memory representation on Big Endian platforms

commit 3eb481d8c30639c5b9a219e4891ccaccf73075b0
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-04-14T19:24:48Z

Use ByteBuffer.wrap instead of allocate

commit 69fc667266c5efe97f796c6b4e8d14470168867d
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-04-15T10:10:09Z

Fix offsets

commit a1f06106d321ca40bef6dcd7865484fd79976b08
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-04-15T11:55:21Z

Wrap byte array once

commit a652865e9f59ca4cf4fc596141ae0511284462b4
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-04-15T12:06:37Z

remove trailing spaces

commit 804740c9dbe3c4bbe8145cc119c997531634ebb1
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-04-18T15:05:48Z

Merge branch 'master' of https://github.com/apache/spark.git into 
apache-master

commit f109bda995a70be8787fd10f414a5be2125d97b2
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-04-19T09:04:41Z

Merge branch 'master' of https://github.com/apache/spark.git into 
apache-master

commit d7cbc84e1dfdae1036345956ea8f210c5c982b3d
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-04-19T14:13:27Z

Big endian implementation using subclassing

commit 648b7ac9fa0f7466e05151d5f07a1e613a682bfb
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-04-19T14:32:19Z

missing else clause




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-19 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-211952868
  
So I have ran the ParquetReadBenchmark several times before this PR and 
after. I'm not sure how to interpret the results though as there is quite a 
variation in results on the same code base. I've also implemented the fix via 
subclassing which should not affect the Liltte Endian code path at all and will 
benchmark that and get back with the results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-18 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-211565728
  
will do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-18 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-211563273
  
@rxin  so what do we need to do to get this into 2.0.0? Although the JIRA 
is of type "improvement" I could argue that it is a blocking defect as Spark 
has supported Big Endian platforms up to now. I'm happy to re-write the patch 
as loading separate subclassed implementations of the On/OffHeapColumnVector 
and VectorizedPlainValuesReader but that is a far more complex fix and given 
the code paths I'd doubt any measurable performance change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-18 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-211247119
  
I haven't ran any explicit performance tests for this. Do we have any 
specific to this area?

Using the static final boolean is allowing the jit to eliminate the dead 
code path. I  discussed an alternative implementation with @nongli  in 
https://github.com/apache/spark/pull/10628#issuecomment-205993243 and the 
if(...) implementation was deemed ok.   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-13745][SQL]Support columnar in mem...

2016-04-15 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-210549840
  
@nongli 

So I've changed the patch to wrap the buffer in initFromPage but only for 
Big Endian. I'd like to see this patch get in to 2.0.0 so Big Endian platforms 
are not broken. We can discuss refactoring the code to always use a ByteBuffer 
rather than byte[] later?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-15 Thread robbinspg
Github user robbinspg commented on a diff in the pull request:

https://github.com/apache/spark/pull/12397#discussion_r59863075
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
 ---
@@ -31,6 +33,8 @@
   private byte[] buffer;
   private int offset;
   private int bitOffset; // Only used for booleans.
+  
+  private final static boolean bigEndianPlatform = 
ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN);
--- End diff --

So the first step is to get this functional on big endian without affecting 
the little endian implementation asap for inclusion in 2.0.0.

In VectorizedPlainValuesReader we could wrap the buffer once (assuming it 
never gets re-allocated) and use the ByteBuffer for the big endian access. We 
could always use the ByteBuffer rather than the byte[] even in little endian 
implementation but I do not know what performance impact that would have and as 
stated above I did not want to mess around with the little endian 
implementation at this time.

Of course the methods in on/offHeapColumnVector that take the byte[] as a 
parameter could also be changed to take the byte buffer instead.

There is also the unnecessary subtracting/adding the 
Platform.BYTE_ARRAY_OFFSET around a lot of the method calls. eg 

  public final void readIntegers(int total, ColumnVector c, int rowId) {
c.putIntsLittleEndian(rowId, total, buffer, offset - 
Platform.BYTE_ARRAY_OFFSET);
offset += 4 * total;
  }

where the first thing that putIntsLittleEndian does is to add that back on 
to the offset.

So I think for now I'm reasonably happy with this big endian support in 
this patch and we could maybe review the whole code structure later. I've 
improved the patch and will maybe make the change to 
VectorizedPlainValuesReader so the code in there only wraps the buffer once.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-14 Thread robbinspg
Github user robbinspg commented on a diff in the pull request:

https://github.com/apache/spark/pull/12397#discussion_r59770275
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
 ---
@@ -31,6 +33,8 @@
   private byte[] buffer;
   private int offset;
   private int bitOffset; // Only used for booleans.
+  
+  private final static boolean bigEndianPlatform = 
ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN);
--- End diff --

In which method? Do you mean:


ByteBuffer.allocate(8).putDouble(v).order(ByteOrder.LITTLE_ENDIAN).getDouble(0);
vs
ByteBuffer.wrap(buffer).order(ByteOrder.LITTLE_ENDIAN).getDouble(offset);

to save the allocate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-14 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-209971909
  
@nongli please can you review this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-14 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/12397

[SPARK-13745][SQL]Support columnar in memory representation on Big Endian 
platforms

## What changes were proposed in this pull request?

parquet datasource and ColumnarBatch tests fail on big-endian platforms 
This patch adds support for the little-endian byte arrays being correctly 
interpreted on a big-endian platform


## How was this patch tested?

Spark test builds ran on big endian z/Linux and regression build on little 
endian amd64





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12397


commit 1fc048385fb0fea93eef85f614586448a3ea7c2a
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-04-14T13:50:34Z

Support columnar in memory representation on Big Endian platforms




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12785][SQL] Add ColumnarBatch, an in me...

2016-04-14 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/10628#issuecomment-209832195
  
@nongli I'm just about there with a solution for Big Endian platforms and 
will be using https://issues.apache.org/jira/browse/SPARK-14151 for the changes.

I have one question:

It is clear from the tests using Parquet that the byte array passed into 
putIntsLittleEndian is in little endian order. It is also the case that the 
byte array passed in to the putFloats and putDoubles has the values in little 
endian. Reversing the floats/doubles enables all the tests to pass.

In OffHeapColumnVector putDoubles(int rowId, int count, byte[] src, int 
srcIndex) if I assume input is LE then the 
org.apache.spark.sql.execution.vectorized.ColumnarBatchSuite.Double APIs test 
fails. This is because it is passing in a byte array of doubles that are in 
platform endian order (created with Platform.putDouble).

My question is: are the byte arrays always in little endian? This seems to 
be true for the Parquet sources?? If so then I can modify the testcase 
'org.apache.spark.sql.execution.vectorized.ColumnarBatchSuite.Double APIs' to 
force the test data into LE.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12785][SQL] Add ColumnarBatch, an in me...

2016-04-07 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/10628#issuecomment-206732654
  
We are actually seeing this issue in the OnHeap code as the byte array 
passed in to putIntLittleEndian is in little endian and the code is trying to 
read that with Platform.getInt, which on big endian platform will return the 
wrong value as it assumes the bytes are in big endian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12785][SQL] Add ColumnarBatch, an in me...

2016-04-02 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/10628#issuecomment-204684533
  
So big endian implementations of  OffHeapColumnVector and 
OnHeapColumnVector are needed. I don't think we'd want to have an inline 'if 
(bigEndian)' in the relevent methods so it may be that we'd want to subclass 
those classes, override the methods which require handling big endian, and 
instantiate the big endian version in ColumnVector.allocate if on a BE platform?

xxxHeapColumnVector classes would have to lose the 'final' attribute.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...

2016-01-05 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/10421#issuecomment-168963270
  
I have a fix for the test failure. Should I create a new Jira and PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12647][SQL] Fix o.a.s.sqlexecution.Exch...

2016-01-05 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/10599

[SPARK-12647][SQL] Fix 
o.a.s.sqlexecution.ExchangeCoordinatorSuite.determining the number of reducers: 
aggregate operator

change expected partition sizes

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 branch-1.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10599.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10599


commit 841eed9acb4f2af3a8c63f80594f0d29e5a05669
Author: Pete Robbins <robbin...@gmail.com>
Date:   2016-01-05T10:07:57Z

Update expected partition size




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...

2016-01-05 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/10421#issuecomment-168973101
  
created https://issues.apache.org/jira/browse/SPARK-12647 and associated PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12647][SQL] Fix o.a.s.sqlexecution.Exch...

2016-01-05 Thread robbinspg
Github user robbinspg closed the pull request at:

https://github.com/apache/spark/pull/10599


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12647][SQL] Fix o.a.s.sqlexecution.Exch...

2016-01-05 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/10599#issuecomment-169148536
  
I closed this as per request but it states "Closed with unmerged commits"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...

2016-01-04 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/10421#issuecomment-168919276
  
Merging this into the 1.6 stream has caused a test failure in 

org.apache.spark.sql.execution.ExchangeCoordinatorSuite.determining the 
number of reducers: aggregate operator

There was a change in master in the ExchangeCoordinatorSuite which set the 
expected partition sizes to a new value. I do not understand why the change in 
this PR affects the input partition sizes but it does.

I think this is a test issue rather than an issue with this PR. Should I 
raise a new Jira to fix the expected partition sizes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...

2015-12-31 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/10421#issuecomment-168172679
  
re-merged with latest master

Please retest


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...

2015-12-30 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/10421#issuecomment-168091202
  
Fixed scala style check

Please retest


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...

2015-12-30 Thread robbinspg
Github user robbinspg commented on a diff in the pull request:

https://github.com/apache/spark/pull/10421#discussion_r48599743
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala
 ---
@@ -171,7 +171,7 @@ object GenerateUnsafeRowJoiner extends 
CodeGenerator[(StructType, StructType), U
|// row1: ${schema1.size} fields, $bitset1Words words in bitset
|// row2: ${schema2.size}, $bitset2Words words in bitset
|// output: ${schema1.size + schema2.size} fields, 
$outputBitsetWords words in bitset
-   |final int sizeInBytes = row1.getSizeInBytes() + 
row2.getSizeInBytes();
+   |final int sizeInBytes = row1.getSizeInBytes() + 
row2.getSizeInBytes() - ($sizeReduction * 8);
--- End diff --

OK. sorted. Can this be run by Jenkins tests?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...

2015-12-28 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/10421#issuecomment-167738123
  
@rxin as the original author of this code could you please review the PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12470] Fix size reduction calculation

2015-12-22 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/10421#issuecomment-166585412
  
I believe this is uncovering a test failure in ExchangeCoordinatorSuite so 
this please hold this PR until I investigate further.

- determining the number of reducers: aggregate operator *** FAILED ***
  3 did not equal 2 (ExchangeCoordinatorSuite.scala:316)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12470] Fix size reduction calculation

2015-12-21 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/10421

[SPARK-12470] Fix size reduction calculation

also only allocate required buffer size

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10421.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10421


commit 440cc51076d0f35cb30e8c7a6caf0d4607ba78bd
Author: Pete Robbins <robbin...@gmail.com>
Date:   2015-12-21T22:21:21Z

Fix size reduction calculation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9710] [test] Fix RPackageUtilsSuite whe...

2015-09-23 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/8008#issuecomment-142583322
  
My 1.5 branch build is failing with as described in SPARK-9710 and I notice 
that this merge didn't make it into that branch. Any chance this will be 
backported? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10454][Spark Core] wait for empty event...

2015-09-04 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/8605

[SPARK-10454][Spark Core] wait for empty event queue



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 DAGSchedulerSuite-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8605.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8605


commit 24438b3f6a870162c79cded14716ee5828bdf9a6
Author: robbins <robb...@uk.ibm.com>
Date:   2015-09-04T20:28:29Z

wait for empty event queue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9869][Streaming] Wait for all event not...

2015-09-03 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/8589

[SPARK-9869][Streaming] Wait for all event notifications before asserting 
results



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 InputStreamSuite-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8589.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8589


commit f1df02b6834bed8dd32fc1382617c278a777cac4
Author: robbins <robb...@uk.ibm.com>
Date:   2015-09-03T17:15:56Z

Wait for all event notifications before asserting results




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10431][ Spark Core ] Fix intermittent t...

2015-09-03 Thread robbinspg
GitHub user robbinspg opened a pull request:

https://github.com/apache/spark/pull/8582

[SPARK-10431][ Spark Core ] Fix intermittent test failure. Wait for event 
queue to be clear



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robbinspg/spark-1 InputOutputMetricsSuite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8582.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8582


commit 3f8f21d1a42a8b8fbd87d134821c0afd5cb4216a
Author: robbins <robb...@uk.ibm.com>
Date:   2015-09-02T15:31:34Z

Fix intermittent test failure. Wait for event queue to be clear




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10431][ Spark Core ] Fix intermittent t...

2015-09-03 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/8582#issuecomment-137432486
  
I see the test failure is https://issues.apache.org/jira/browse/SPARK-9869 
which I'm sure is not related to this pull request.

Ironically, looking at SPARK-9869 it looks like it could be a very similar 
issue to this fix, ie waiting for the listenerBus is empty before the asserts!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org