[jira] [Resolved] (PARQUET-725) Parquet AVRO tests fail when debug logging is enabled

2017-12-12 Thread Niels Basjes (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niels Basjes resolved PARQUET-725.
--
Resolution: Fixed

I found that the upgrade to Avro 1.8.2 has been done in PARQUET-1149
I verified and the problem described here no longer occurs.

> Parquet AVRO tests fail when debug logging is enabled
> -
>
> Key: PARQUET-725
> URL: https://issues.apache.org/jira/browse/PARQUET-725
> Project: Parquet
>  Issue Type: Bug
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>
> I found that on my machine some of the tests in the parquet-avro fail.
> {code}
> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec
> Running org.apache.parquet.avro.TestAvroDataSupplier
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
> Running org.apache.parquet.avro.TestReadWrite
> Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestBackwardCompatibility
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
> Running org.apache.parquet.avro.TestReadWriteOldListBehavior
> Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestInputOutputFormat
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec
> Running org.apache.parquet.avro.TestReflectLogicalTypes
> Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec
> Running org.apache.parquet.avro.TestCircularReferences
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec
> Results :
> Failed tests:   
> testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): 
> expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, 
> "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": 
> "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": 
> {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], 
> "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but 
> was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, 
> "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": 
> "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, 
> 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", 
> "c"]}>
>   testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): 
> Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, 
> {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": 
> {"bytes": ""}}]>
>   testAll[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
> {code}
> I see two classes of problems:
> # The json with byte arrays appear different.
> # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers 
> that both contain the SAME bytes these tests fail simply because the position 
> field of the ByteBuffer is different. I think these should compare the 
> contents of the ByteBuffer instead.
> {code}
>  but 
> was:
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PARQUET-725) Parquet AVRO tests fail when debug logging is enabled

2016-11-29 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704638#comment-15704638
 ] 

Niels Basjes commented on PARQUET-725:
--

Looks like the date problem has been picked up in PARQUET-765.
To fix the debug logging problem Avro 1.8.2 is needed.

> Parquet AVRO tests fail when debug logging is enabled
> -
>
> Key: PARQUET-725
> URL: https://issues.apache.org/jira/browse/PARQUET-725
> Project: Parquet
>  Issue Type: Bug
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>
> I found that on my machine some of the tests in the parquet-avro fail.
> {code}
> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec
> Running org.apache.parquet.avro.TestAvroDataSupplier
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
> Running org.apache.parquet.avro.TestReadWrite
> Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestBackwardCompatibility
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
> Running org.apache.parquet.avro.TestReadWriteOldListBehavior
> Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestInputOutputFormat
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec
> Running org.apache.parquet.avro.TestReflectLogicalTypes
> Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec
> Running org.apache.parquet.avro.TestCircularReferences
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec
> Results :
> Failed tests:   
> testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): 
> expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, 
> "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": 
> "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": 
> {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], 
> "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but 
> was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, 
> "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": 
> "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, 
> 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", 
> "c"]}>
>   testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): 
> Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, 
> {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": 
> {"bytes": ""}}]>
>   testAll[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
> {code}
> I see two classes of problems:
> # The json with byte arrays appear different.
> # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers 
> that both contain the SAME bytes these tests fail simply because the position 
> field of the ByteBuffer is different. I think these should compare the 
> contents of the ByteBuffer instead.
> {code}
>  but 
> was:
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-725) Parquet AVRO tests fail when debug logging is enabled

2016-10-19 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588034#comment-15588034
 ] 

Niels Basjes commented on PARQUET-725:
--

Easy fix:
{code}diff --git a/parquet-avro/src/test/resources/car.avdl 
b/parquet-avro/src/test/resources/car.avdl
index b848da5..1f459a3 100644
--- a/parquet-avro/src/test/resources/car.avdl
+++ b/parquet-avro/src/test/resources/car.avdl
@@ -21,7 +21,7 @@
 protocol Cars {
 
 record Service {
-long date;
+long `date`;
 string mechanic;
 }
 
{code}

> Parquet AVRO tests fail when debug logging is enabled
> -
>
> Key: PARQUET-725
> URL: https://issues.apache.org/jira/browse/PARQUET-725
> Project: Parquet
>  Issue Type: Bug
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>
> I found that on my machine some of the tests in the parquet-avro fail.
> {code}
> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec
> Running org.apache.parquet.avro.TestAvroDataSupplier
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
> Running org.apache.parquet.avro.TestReadWrite
> Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestBackwardCompatibility
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
> Running org.apache.parquet.avro.TestReadWriteOldListBehavior
> Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestInputOutputFormat
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec
> Running org.apache.parquet.avro.TestReflectLogicalTypes
> Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec
> Running org.apache.parquet.avro.TestCircularReferences
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec
> Results :
> Failed tests:   
> testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): 
> expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, 
> "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": 
> "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": 
> {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], 
> "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but 
> was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, 
> "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": 
> "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, 
> 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", 
> "c"]}>
>   testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): 
> Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, 
> {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": 
> {"bytes": ""}}]>
>   testAll[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
> {code}
> I see two classes of problems:
> # The json with byte arrays appear different.
> # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers 
> that both contain the SAME bytes these tests fail simply because the position 
> field of the ByteBuffer is different. I think these should compare the 
> contents of the ByteBuffer instead.
> {code}
> 

[jira] [Updated] (PARQUET-725) Parquet AVRO tests fail when debug logging is enabled

2016-10-16 Thread Niels Basjes (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niels Basjes updated PARQUET-725:
-
Summary: Parquet AVRO tests fail when debug logging is enabled  (was: 
Parquet AVRO tests fail)

> Parquet AVRO tests fail when debug logging is enabled
> -
>
> Key: PARQUET-725
> URL: https://issues.apache.org/jira/browse/PARQUET-725
> Project: Parquet
>  Issue Type: Bug
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>
> I found that on my machine some of the tests in the parquet-avro fail.
> {code}
> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec
> Running org.apache.parquet.avro.TestAvroDataSupplier
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
> Running org.apache.parquet.avro.TestReadWrite
> Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestBackwardCompatibility
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
> Running org.apache.parquet.avro.TestReadWriteOldListBehavior
> Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestInputOutputFormat
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec
> Running org.apache.parquet.avro.TestReflectLogicalTypes
> Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec
> Running org.apache.parquet.avro.TestCircularReferences
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec
> Results :
> Failed tests:   
> testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): 
> expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, 
> "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": 
> "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": 
> {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], 
> "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but 
> was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, 
> "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": 
> "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, 
> 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", 
> "c"]}>
>   testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): 
> Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, 
> {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": 
> {"bytes": ""}}]>
>   testAll[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
> {code}
> I see two classes of problems:
> # The json with byte arrays appear different.
> # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers 
> that both contain the SAME bytes these tests fail simply because the position 
> field of the ByteBuffer is different. I think these should compare the 
> contents of the ByteBuffer instead.
> {code}
>  but 
> was:
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-740) Introduce editorconfig

2016-10-06 Thread Niels Basjes (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niels Basjes updated PARQUET-740:
-
Description: 
Editor config is a very easy way of ensuring that developers adhere more 
closely to the same coding standards when it comes to using tabs/spaces , 
trailing spaces, end of lines etc.

Quote from http://editorconfig.org/
{quote}
EditorConfig helps developers define and maintain consistent coding styles 
between different editors and IDEs. The EditorConfig project consists of a file 
format for defining coding styles and a collection of text editor plugins that 
enable editors to read the file format and adhere to defined styles. 
EditorConfig files are easily readable and they work nicely with version 
control systems.
{quote}



  was:Editor config is a very easy way of ensuring that developers adhere more 
closely to the same coding standards when it comes to using tabs/spaces , 
trailing spaces, end of lines etc.


> Introduce editorconfig
> --
>
> Key: PARQUET-740
> URL: https://issues.apache.org/jira/browse/PARQUET-740
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>
> Editor config is a very easy way of ensuring that developers adhere more 
> closely to the same coding standards when it comes to using tabs/spaces , 
> trailing spaces, end of lines etc.
> Quote from http://editorconfig.org/
> {quote}
> EditorConfig helps developers define and maintain consistent coding styles 
> between different editors and IDEs. The EditorConfig project consists of a 
> file format for defining coding styles and a collection of text editor 
> plugins that enable editors to read the file format and adhere to defined 
> styles. EditorConfig files are easily readable and they work nicely with 
> version control systems.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-740) Introduce editorconfig

2016-10-06 Thread Niels Basjes (JIRA)
Niels Basjes created PARQUET-740:


 Summary: Introduce editorconfig
 Key: PARQUET-740
 URL: https://issues.apache.org/jira/browse/PARQUET-740
 Project: Parquet
  Issue Type: Improvement
Reporter: Niels Basjes
Assignee: Niels Basjes


Editor config is a very easy way of ensuring that developers adhere more 
closely to the same coding standards when it comes to using tabs/spaces , 
trailing spaces, end of lines etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-725) Parquet AVRO tests fail

2016-09-29 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532822#comment-15532822
 ] 

Niels Basjes commented on PARQUET-725:
--

Note that AVRO 1.8.2 (as it stands currently) is likely to introduce this 
problem AVRO-1924 because that is used in the file 
parquet-avro/src/test/resources/car.avdl

> Parquet AVRO tests fail
> ---
>
> Key: PARQUET-725
> URL: https://issues.apache.org/jira/browse/PARQUET-725
> Project: Parquet
>  Issue Type: Bug
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>
> I found that on my machine some of the tests in the parquet-avro fail.
> {code}
> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec
> Running org.apache.parquet.avro.TestAvroDataSupplier
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
> Running org.apache.parquet.avro.TestReadWrite
> Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestBackwardCompatibility
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
> Running org.apache.parquet.avro.TestReadWriteOldListBehavior
> Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestInputOutputFormat
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec
> Running org.apache.parquet.avro.TestReflectLogicalTypes
> Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec
> Running org.apache.parquet.avro.TestCircularReferences
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec
> Results :
> Failed tests:   
> testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): 
> expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, 
> "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": 
> "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": 
> {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], 
> "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but 
> was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, 
> "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": 
> "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, 
> 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", 
> "c"]}>
>   testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): 
> Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, 
> {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": 
> {"bytes": ""}}]>
>   testAll[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
> {code}
> I see two classes of problems:
> # The json with byte arrays appear different.
> # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers 
> that both contain the SAME bytes these tests fail simply because the position 
> field of the ByteBuffer is different. I think these should compare the 
> contents of the ByteBuffer instead.
> {code}
>  but 
> was:
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-725) Parquet AVRO tests fail

2016-09-29 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532225#comment-15532225
 ] 

Niels Basjes commented on PARQUET-725:
--

After AVRO 1.8.2. has been released this problem can easily be fixed by 
changing the version of AVRO in the pom.xml.

So this issue will eventually be fixed with:
{code}
diff --git pom.xml pom.xml
index ca34309..e1f0cfd 100644
--- pom.xml
+++ pom.xml
@@ -86,7 +86,7 @@
 6.5.7
 0.9.33
 1.7.5
-1.8.0
+1.8.2
 11.0
 1.9.5
   
{code}

> Parquet AVRO tests fail
> ---
>
> Key: PARQUET-725
> URL: https://issues.apache.org/jira/browse/PARQUET-725
> Project: Parquet
>  Issue Type: Bug
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>
> I found that on my machine some of the tests in the parquet-avro fail.
> {code}
> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec
> Running org.apache.parquet.avro.TestAvroDataSupplier
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
> Running org.apache.parquet.avro.TestReadWrite
> Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestBackwardCompatibility
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
> Running org.apache.parquet.avro.TestReadWriteOldListBehavior
> Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestInputOutputFormat
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec
> Running org.apache.parquet.avro.TestReflectLogicalTypes
> Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec
> Running org.apache.parquet.avro.TestCircularReferences
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec
> Results :
> Failed tests:   
> testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): 
> expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, 
> "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": 
> "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": 
> {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], 
> "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but 
> was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, 
> "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": 
> "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, 
> 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", 
> "c"]}>
>   testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): 
> Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, 
> {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": 
> {"bytes": ""}}]>
>   testAll[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
> {code}
> I see two classes of problems:
> # The json with byte arrays appear different.
> # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers 
> that both contain the SAME bytes these tests fail simply because the position 
> field of the ByteBuffer is different. I think these should compare the 
> contents of the ByteBuffer instead.
> {code}
>  but 
> 

[jira] [Assigned] (PARQUET-725) Parquet AVRO tests fail

2016-09-28 Thread Niels Basjes (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niels Basjes reassigned PARQUET-725:


Assignee: Niels Basjes

> Parquet AVRO tests fail
> ---
>
> Key: PARQUET-725
> URL: https://issues.apache.org/jira/browse/PARQUET-725
> Project: Parquet
>  Issue Type: Bug
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>
> I found that on my machine some of the tests in the parquet-avro fail.
> {code}
> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec
> Running org.apache.parquet.avro.TestAvroDataSupplier
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
> Running org.apache.parquet.avro.TestReadWrite
> Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestBackwardCompatibility
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
> Running org.apache.parquet.avro.TestReadWriteOldListBehavior
> Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestInputOutputFormat
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec
> Running org.apache.parquet.avro.TestReflectLogicalTypes
> Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec
> Running org.apache.parquet.avro.TestCircularReferences
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec
> Results :
> Failed tests:   
> testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): 
> expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, 
> "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": 
> "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": 
> {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], 
> "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but 
> was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, 
> "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": 
> "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, 
> 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", 
> "c"]}>
>   testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): 
> Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, 
> {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": 
> {"bytes": ""}}]>
>   testAll[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
> {code}
> I see two classes of problems:
> # The json with byte arrays appear different.
> # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers 
> that both contain the SAME bytes these tests fail simply because the position 
> field of the ByteBuffer is different. I think these should compare the 
> contents of the ByteBuffer instead.
> {code}
>  but 
> was:
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-727) Ensure correct version of thrift is used

2016-09-26 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522718#comment-15522718
 ] 

Niels Basjes commented on PARQUET-727:
--

[~julienledem] [~rdblue] Can you guys please add me as a contributor in Jira so 
I can assign Parquet issues to myself? Thanks.


> Ensure correct version of thrift is used
> 
>
> Key: PARQUET-727
> URL: https://issues.apache.org/jira/browse/PARQUET-727
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Niels Basjes
>
> I found that if you have the wrong version of thrift in your path during the 
> build the errors you get are very obscure and verbose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-725) Parquet AVRO tests fail

2016-09-23 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516632#comment-15516632
 ] 

Niels Basjes commented on PARQUET-725:
--

So the root cause is AVRO-1799 which surfaced after changing the logging setup 
in PARQUET-423.

For now I pushed a default log4j.properties that is set to INFO logging.
This passed the build. Set if to DEBUG and the build will fail.

The action that remains for this issue is to upgrade to AVRO 1.8.2 (as soon as 
it is released).
aIf you don't then any logging of the AVRO related data is likely to be 
modified by the logging itself.


> Parquet AVRO tests fail
> ---
>
> Key: PARQUET-725
> URL: https://issues.apache.org/jira/browse/PARQUET-725
> Project: Parquet
>  Issue Type: Bug
>Reporter: Niels Basjes
>
> I found that on my machine some of the tests in the parquet-avro fail.
> {code}
> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec
> Running org.apache.parquet.avro.TestAvroDataSupplier
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
> Running org.apache.parquet.avro.TestReadWrite
> Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestBackwardCompatibility
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
> Running org.apache.parquet.avro.TestReadWriteOldListBehavior
> Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestInputOutputFormat
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec
> Running org.apache.parquet.avro.TestReflectLogicalTypes
> Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec
> Running org.apache.parquet.avro.TestCircularReferences
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec
> Results :
> Failed tests:   
> testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): 
> expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, 
> "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": 
> "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": 
> {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], 
> "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but 
> was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, 
> "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": 
> "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, 
> 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", 
> "c"]}>
>   testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): 
> Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, 
> {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": 
> {"bytes": ""}}]>
>   testAll[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
> {code}
> I see two classes of problems:
> # The json with byte arrays appear different.
> # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers 
> that both contain the SAME bytes these tests fail simply because the position 
> field of the ByteBuffer is different. I think these should compare the 
> contents of the ByteBuffer instead.
> {code}
> 

[jira] [Comment Edited] (PARQUET-725) Parquet AVRO tests fail

2016-09-23 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516512#comment-15516512
 ] 

Niels Basjes edited comment on PARQUET-725 at 9/23/16 2:00 PM:
---

Found the root cause (fixed in a yet to be released version of AVRO): 
AVRO-1799:  java: GenericData.toString() mutates underlying ByteBuffer backed 
data

This also is the reason this problem did not occur in my IDE (IntelliJ).
The debugger underlying does a 'toString' to show the record on the screen 
during debugging.
Because this was done on both the 'equals' a step later would now succeed, 
while when running it would make it fail.


was (Author: nielsbasjes):
Found the propable root cause (fixed in a yet to be released version of AVRO): 
AVRO-1799:  java: GenericData.toString() mutates underlying ByteBuffer backed 
data

> Parquet AVRO tests fail
> ---
>
> Key: PARQUET-725
> URL: https://issues.apache.org/jira/browse/PARQUET-725
> Project: Parquet
>  Issue Type: Bug
>Reporter: Niels Basjes
>
> I found that on my machine some of the tests in the parquet-avro fail.
> {code}
> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec
> Running org.apache.parquet.avro.TestAvroDataSupplier
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
> Running org.apache.parquet.avro.TestReadWrite
> Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestBackwardCompatibility
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
> Running org.apache.parquet.avro.TestReadWriteOldListBehavior
> Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestInputOutputFormat
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec
> Running org.apache.parquet.avro.TestReflectLogicalTypes
> Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec
> Running org.apache.parquet.avro.TestCircularReferences
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec
> Results :
> Failed tests:   
> testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): 
> expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, 
> "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": 
> "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": 
> {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], 
> "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but 
> was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, 
> "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": 
> "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, 
> 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", 
> "c"]}>
>   testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): 
> Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, 
> {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": 
> {"bytes": ""}}]>
>   testAll[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
> {code}
> I see two classes of problems:
> # The json with byte arrays appear different.
> # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers 

[jira] [Commented] (PARQUET-725) Parquet AVRO tests fail

2016-09-23 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516512#comment-15516512
 ] 

Niels Basjes commented on PARQUET-725:
--

Found the propable root cause (fixed in a yet to be released version of AVRO): 
AVRO-1799:  java: GenericData.toString() mutates underlying ByteBuffer backed 
data

> Parquet AVRO tests fail
> ---
>
> Key: PARQUET-725
> URL: https://issues.apache.org/jira/browse/PARQUET-725
> Project: Parquet
>  Issue Type: Bug
>Reporter: Niels Basjes
>
> I found that on my machine some of the tests in the parquet-avro fail.
> {code}
> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec
> Running org.apache.parquet.avro.TestAvroDataSupplier
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
> Running org.apache.parquet.avro.TestReadWrite
> Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestBackwardCompatibility
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
> Running org.apache.parquet.avro.TestReadWriteOldListBehavior
> Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec 
> <<< FAILURE!
> Running org.apache.parquet.avro.TestInputOutputFormat
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec
> Running org.apache.parquet.avro.TestReflectLogicalTypes
> Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec
> Running org.apache.parquet.avro.TestCircularReferences
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec
> Results :
> Failed tests:   
> testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): 
> expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, 
> "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": 
> "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": 
> {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], 
> "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but 
> was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, 
> "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": 
> "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, 
> 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", 
> "c"]}>
>   testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): 
> Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, 
> {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": 
> {"bytes": ""}}]>
>   testAll[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): 
> expected: but 
> was:
>   testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
>   testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
> expected: but 
> was:
>   
> testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior):
>  expected: but 
> was:
> {code}
> I see two classes of problems:
> # The json with byte arrays appear different.
> # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers 
> that both contain the SAME bytes these tests fail simply because the position 
> field of the ByteBuffer is different. I think these should compare the 
> contents of the ByteBuffer instead.
> {code}
>  but 
> was:
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-727) Ensure correct version of thrift is used

2016-09-23 Thread Niels Basjes (JIRA)
Niels Basjes created PARQUET-727:


 Summary: Ensure correct version of thrift is used
 Key: PARQUET-727
 URL: https://issues.apache.org/jira/browse/PARQUET-727
 Project: Parquet
  Issue Type: Improvement
Reporter: Niels Basjes


I found that if you have the wrong version of thrift in your path during the 
build the errors you get are very obscure and verbose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-725) Parquet AVRO tests fail

2016-09-23 Thread Niels Basjes (JIRA)
Niels Basjes created PARQUET-725:


 Summary: Parquet AVRO tests fail
 Key: PARQUET-725
 URL: https://issues.apache.org/jira/browse/PARQUET-725
 Project: Parquet
  Issue Type: Bug
Reporter: Niels Basjes


I found that on my machine some of the tests in the parquet-avro fail.

{code}
Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec
Running org.apache.parquet.avro.TestAvroDataSupplier
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
Running org.apache.parquet.avro.TestReadWrite
Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec <<< 
FAILURE!
Running org.apache.parquet.avro.TestBackwardCompatibility
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
Running org.apache.parquet.avro.TestReadWriteOldListBehavior
Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec <<< 
FAILURE!
Running org.apache.parquet.avro.TestInputOutputFormat
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec
Running org.apache.parquet.avro.TestReflectLogicalTypes
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec
Running org.apache.parquet.avro.TestCircularReferences
Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec

Results :

Failed tests:   
testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): 
expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 
2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": 
"\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": {"a": 
"1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], "mystringarray": 
["a", "b"], "mylist": ["a", "b", "c"]}> but was:<{"myboolean": true, "mybyte": 
1, "myshort": 1, "myint": 1, "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, 
"mybytes": {"bytes": ""}, "mystring": "Hello", "myenum": "A", "mymap": {"a": 
"1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], "mystringarray": 
["a", "b"], "mylist": ["a", "b", "c"]}>
  testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): 
Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, 
{"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": 
{"bytes": ""}}]>
  testAll[0](org.apache.parquet.avro.TestReadWrite): 
expected: but 
was:
  testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): 
expected: but 
was:
  testAll[1](org.apache.parquet.avro.TestReadWrite): 
expected: but 
was:
  testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): 
expected: but 
was:
  testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
expected: but 
was:
  
testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior):
 expected: but 
was:
  testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): 
expected: but 
was:
  
testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior):
 expected: but 
was:

{code}

I see two classes of problems:
# The json with byte arrays appear different.
# Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers 
that both contain the SAME bytes these tests fail simply because the position 
field of the ByteBuffer is different. I think these should compare the contents 
of the ByteBuffer instead.
{code}
 but 
was:
{code}







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-722) Building with JDK 8 fails over a maven bug

2016-09-21 Thread Niels Basjes (JIRA)
Niels Basjes created PARQUET-722:


 Summary: Building with JDK 8 fails over a maven bug
 Key: PARQUET-722
 URL: https://issues.apache.org/jira/browse/PARQUET-722
 Project: Parquet
  Issue Type: Bug
Reporter: Niels Basjes


When I build parquet on my system I get this error during the build:
{quote}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on 
project parquet-generator: Error rendering velocity resource. 
NullPointerException -> [Help 1]
{quote}

About a year ago [~julienledem] responded that this is caused due to a bug in 
Maven in combination with Java 8:

At this page 
http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure/33360512#33360512
 

Now this bug has been solved at the Maven end in maven-filtering 1.2
https://issues.apache.org/jira/browse/MSHARED-319

The problem is that this fix has not yet been integrated into the latest 
available maven versions yet.

I'll put up a pull request with a proposed fix for this.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-423) Make writing Avro to Parquet less noisy

2016-01-12 Thread Niels Basjes (JIRA)
Niels Basjes created PARQUET-423:


 Summary: Make writing Avro to Parquet less noisy
 Key: PARQUET-423
 URL: https://issues.apache.org/jira/browse/PARQUET-423
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-avro
Affects Versions: 1.8.0
Reporter: Niels Basjes
Priority: Minor


When writing Avro files to disk using the AvroParquetWriter for each column in 
the file some statistics are written to the Logging system.
When writing files based on a large Avro schema often the output of this 
logging is no longer useful and becomes a hassle.

Because the logging level is hardcoded (why?) into the parquet library I would 
like to introduce a switch that allows to enable/disable this type of logging.

{code}
Jan 12, 2016 1:43:00 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 90B for 
[IPAddress] BINARY: 60 values, 26B raw, 47B comp, 1 pages, encodings: 
[RLE_DICTIONARY, PLAIN], dic { 7 entries, 77B raw, 7B comp}
Jan 12, 2016 1:43:00 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 102B for [country] 
BINARY: 60 values, 26B raw, 47B comp, 1 pages, encodings: [RLE_DICTIONARY, 
PLAIN], dic { 7 entries, 119B raw, 7B comp}
Jan 12, 2016 1:43:00 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 152B for 
[windowid] BINARY: 60 values, 33B raw, 51B comp, 1 pages, encodings: 
[RLE_DICTIONARY, PLAIN], dic { 12 entries, 480B raw, 12B comp}
Jan 12, 2016 1:43:00 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 77B for 
[customerId] BINARY: 58 values, 22B raw, 42B comp, 1 pages, encodings: 
[RLE_DICTIONARY, PLAIN], dic { 7 entries, 49B raw, 7B comp}
Jan 12, 2016 1:43:00 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 86B for 
[sessionId] BINARY: 58 values, 28B raw, 43B comp, 1 pages, encodings: 
[RLE_DICTIONARY, PLAIN], dic { 10 entries, 110B raw, 10B comp}
Jan 12, 2016 1:43:00 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 93B for 
[sessionEventNr] INT64: 58 values, 34B raw, 48B comp, 1 pages, encodings: 
[RLE_DICTIONARY, PLAIN], dic { 14 entries, 112B raw, 14B comp}
Jan 12, 2016 1:43:00 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 114B for [visitId] 
BINARY: 58 values, 28B raw, 43B comp, 1 pages, encodings: [RLE_DICTIONARY, 
PLAIN], dic { 10 entries, 250B raw, 10B comp}
Jan 12, 2016 1:43:00 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 90B for 
[visitEventNr] INT64: 58 values, 34B raw, 45B comp, 1 pages, encodings: 
[RLE_DICTIONARY, PLAIN], dic { 11 entries, 88B raw, 11B comp}
Jan 12, 2016 1:43:00 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 112B for 
[timestamp] INT64: 58 values, 50B raw, 66B comp, 1 pages, encodings: 
[RLE_DICTIONARY, PLAIN], dic { 46 entries, 368B raw, 46B comp}
Jan 12, 2016 1:43:00 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 85B for 
[IPAddress] BINARY: 58 values, 22B raw, 42B comp, 1 pages, encodings: 
[RLE_DICTIONARY, PLAIN], dic { 7 entries, 77B raw, 7B comp}
Jan 12, 2016 1:43:00 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 97B for [country] 
BINARY: 58 values, 22B raw, 42B comp, 1 pages, encodings: [RLE_DICTIONARY, 
PLAIN], dic { 7 entries, 119B raw, 7B comp}
Jan 12, 2016 1:43:00 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 144B for 
[windowid] BINARY: 58 values, 28B raw, 43B comp, 1 pages, encodings: 
[RLE_DICTIONARY, PLAIN], dic { 10 entries, 400B raw, 10B comp}
{code}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)