[jira] [Resolved] (PARQUET-725) Parquet AVRO tests fail when debug logging is enabled
[ https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niels Basjes resolved PARQUET-725. -- Resolution: Fixed I found that the upgrade to Avro 1.8.2 has been done in PARQUET-1149 I verified and the problem described here no longer occurs. > Parquet AVRO tests fail when debug logging is enabled > - > > Key: PARQUET-725 > URL: https://issues.apache.org/jira/browse/PARQUET-725 > Project: Parquet > Issue Type: Bug >Reporter: Niels Basjes >Assignee: Niels Basjes > > I found that on my machine some of the tests in the parquet-avro fail. > {code} > Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec > Running org.apache.parquet.avro.TestAvroDataSupplier > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec > Running org.apache.parquet.avro.TestReadWrite > Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestBackwardCompatibility > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec > Running org.apache.parquet.avro.TestReadWriteOldListBehavior > Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestInputOutputFormat > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec > Running org.apache.parquet.avro.TestReflectLogicalTypes > Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec > Running org.apache.parquet.avro.TestCircularReferences > Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec > Results : > Failed tests: > testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): > expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, > "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": > "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": > {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], > "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but > was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, > "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": > "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, > 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", > "c"]}> > testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): > Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, > {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": > {"bytes": ""}}]> > testAll[0](org.apache.parquet.avro.TestReadWrite): > expected:but > was: > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > {code} > I see two classes of problems: > # The json with byte arrays appear different. > # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers > that both contain the SAME bytes these tests fail simply because the position > field of the ByteBuffer is different. I think these should compare the > contents of the ByteBuffer instead. > {code} > but > was: > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PARQUET-725) Parquet AVRO tests fail when debug logging is enabled
[ https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704638#comment-15704638 ] Niels Basjes commented on PARQUET-725: -- Looks like the date problem has been picked up in PARQUET-765. To fix the debug logging problem Avro 1.8.2 is needed. > Parquet AVRO tests fail when debug logging is enabled > - > > Key: PARQUET-725 > URL: https://issues.apache.org/jira/browse/PARQUET-725 > Project: Parquet > Issue Type: Bug >Reporter: Niels Basjes >Assignee: Niels Basjes > > I found that on my machine some of the tests in the parquet-avro fail. > {code} > Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec > Running org.apache.parquet.avro.TestAvroDataSupplier > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec > Running org.apache.parquet.avro.TestReadWrite > Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestBackwardCompatibility > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec > Running org.apache.parquet.avro.TestReadWriteOldListBehavior > Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestInputOutputFormat > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec > Running org.apache.parquet.avro.TestReflectLogicalTypes > Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec > Running org.apache.parquet.avro.TestCircularReferences > Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec > Results : > Failed tests: > testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): > expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, > "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": > "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": > {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], > "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but > was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, > "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": > "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, > 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", > "c"]}> > testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): > Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, > {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": > {"bytes": ""}}]> > testAll[0](org.apache.parquet.avro.TestReadWrite): > expected:but > was: > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > {code} > I see two classes of problems: > # The json with byte arrays appear different. > # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers > that both contain the SAME bytes these tests fail simply because the position > field of the ByteBuffer is different. I think these should compare the > contents of the ByteBuffer instead. > {code} > but > was: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-725) Parquet AVRO tests fail when debug logging is enabled
[ https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588034#comment-15588034 ] Niels Basjes commented on PARQUET-725: -- Easy fix: {code}diff --git a/parquet-avro/src/test/resources/car.avdl b/parquet-avro/src/test/resources/car.avdl index b848da5..1f459a3 100644 --- a/parquet-avro/src/test/resources/car.avdl +++ b/parquet-avro/src/test/resources/car.avdl @@ -21,7 +21,7 @@ protocol Cars { record Service { -long date; +long `date`; string mechanic; } {code} > Parquet AVRO tests fail when debug logging is enabled > - > > Key: PARQUET-725 > URL: https://issues.apache.org/jira/browse/PARQUET-725 > Project: Parquet > Issue Type: Bug >Reporter: Niels Basjes >Assignee: Niels Basjes > > I found that on my machine some of the tests in the parquet-avro fail. > {code} > Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec > Running org.apache.parquet.avro.TestAvroDataSupplier > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec > Running org.apache.parquet.avro.TestReadWrite > Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestBackwardCompatibility > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec > Running org.apache.parquet.avro.TestReadWriteOldListBehavior > Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestInputOutputFormat > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec > Running org.apache.parquet.avro.TestReflectLogicalTypes > Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec > Running org.apache.parquet.avro.TestCircularReferences > Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec > Results : > Failed tests: > testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): > expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, > "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": > "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": > {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], > "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but > was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, > "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": > "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, > 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", > "c"]}> > testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): > Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, > {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": > {"bytes": ""}}]> > testAll[0](org.apache.parquet.avro.TestReadWrite): > expected:but > was: > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > {code} > I see two classes of problems: > # The json with byte arrays appear different. > # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers > that both contain the SAME bytes these tests fail simply because the position > field of the ByteBuffer is different. I think these should compare the > contents of the ByteBuffer instead. > {code} >
[jira] [Updated] (PARQUET-725) Parquet AVRO tests fail when debug logging is enabled
[ https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niels Basjes updated PARQUET-725: - Summary: Parquet AVRO tests fail when debug logging is enabled (was: Parquet AVRO tests fail) > Parquet AVRO tests fail when debug logging is enabled > - > > Key: PARQUET-725 > URL: https://issues.apache.org/jira/browse/PARQUET-725 > Project: Parquet > Issue Type: Bug >Reporter: Niels Basjes >Assignee: Niels Basjes > > I found that on my machine some of the tests in the parquet-avro fail. > {code} > Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec > Running org.apache.parquet.avro.TestAvroDataSupplier > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec > Running org.apache.parquet.avro.TestReadWrite > Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestBackwardCompatibility > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec > Running org.apache.parquet.avro.TestReadWriteOldListBehavior > Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestInputOutputFormat > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec > Running org.apache.parquet.avro.TestReflectLogicalTypes > Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec > Running org.apache.parquet.avro.TestCircularReferences > Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec > Results : > Failed tests: > testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): > expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, > "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": > "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": > {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], > "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but > was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, > "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": > "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, > 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", > "c"]}> > testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): > Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, > {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": > {"bytes": ""}}]> > testAll[0](org.apache.parquet.avro.TestReadWrite): > expected:but > was: > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > {code} > I see two classes of problems: > # The json with byte arrays appear different. > # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers > that both contain the SAME bytes these tests fail simply because the position > field of the ByteBuffer is different. I think these should compare the > contents of the ByteBuffer instead. > {code} > but > was: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-740) Introduce editorconfig
[ https://issues.apache.org/jira/browse/PARQUET-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niels Basjes updated PARQUET-740: - Description: Editor config is a very easy way of ensuring that developers adhere more closely to the same coding standards when it comes to using tabs/spaces , trailing spaces, end of lines etc. Quote from http://editorconfig.org/ {quote} EditorConfig helps developers define and maintain consistent coding styles between different editors and IDEs. The EditorConfig project consists of a file format for defining coding styles and a collection of text editor plugins that enable editors to read the file format and adhere to defined styles. EditorConfig files are easily readable and they work nicely with version control systems. {quote} was:Editor config is a very easy way of ensuring that developers adhere more closely to the same coding standards when it comes to using tabs/spaces , trailing spaces, end of lines etc. > Introduce editorconfig > -- > > Key: PARQUET-740 > URL: https://issues.apache.org/jira/browse/PARQUET-740 > Project: Parquet > Issue Type: Improvement >Reporter: Niels Basjes >Assignee: Niels Basjes > > Editor config is a very easy way of ensuring that developers adhere more > closely to the same coding standards when it comes to using tabs/spaces , > trailing spaces, end of lines etc. > Quote from http://editorconfig.org/ > {quote} > EditorConfig helps developers define and maintain consistent coding styles > between different editors and IDEs. The EditorConfig project consists of a > file format for defining coding styles and a collection of text editor > plugins that enable editors to read the file format and adhere to defined > styles. EditorConfig files are easily readable and they work nicely with > version control systems. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-740) Introduce editorconfig
Niels Basjes created PARQUET-740: Summary: Introduce editorconfig Key: PARQUET-740 URL: https://issues.apache.org/jira/browse/PARQUET-740 Project: Parquet Issue Type: Improvement Reporter: Niels Basjes Assignee: Niels Basjes Editor config is a very easy way of ensuring that developers adhere more closely to the same coding standards when it comes to using tabs/spaces , trailing spaces, end of lines etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-725) Parquet AVRO tests fail
[ https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532822#comment-15532822 ] Niels Basjes commented on PARQUET-725: -- Note that AVRO 1.8.2 (as it stands currently) is likely to introduce this problem AVRO-1924 because that is used in the file parquet-avro/src/test/resources/car.avdl > Parquet AVRO tests fail > --- > > Key: PARQUET-725 > URL: https://issues.apache.org/jira/browse/PARQUET-725 > Project: Parquet > Issue Type: Bug >Reporter: Niels Basjes >Assignee: Niels Basjes > > I found that on my machine some of the tests in the parquet-avro fail. > {code} > Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec > Running org.apache.parquet.avro.TestAvroDataSupplier > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec > Running org.apache.parquet.avro.TestReadWrite > Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestBackwardCompatibility > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec > Running org.apache.parquet.avro.TestReadWriteOldListBehavior > Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestInputOutputFormat > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec > Running org.apache.parquet.avro.TestReflectLogicalTypes > Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec > Running org.apache.parquet.avro.TestCircularReferences > Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec > Results : > Failed tests: > testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): > expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, > "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": > "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": > {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], > "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but > was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, > "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": > "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, > 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", > "c"]}> > testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): > Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, > {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": > {"bytes": ""}}]> > testAll[0](org.apache.parquet.avro.TestReadWrite): > expected:but > was: > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > {code} > I see two classes of problems: > # The json with byte arrays appear different. > # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers > that both contain the SAME bytes these tests fail simply because the position > field of the ByteBuffer is different. I think these should compare the > contents of the ByteBuffer instead. > {code} > but > was: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-725) Parquet AVRO tests fail
[ https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532225#comment-15532225 ] Niels Basjes commented on PARQUET-725: -- After AVRO 1.8.2. has been released this problem can easily be fixed by changing the version of AVRO in the pom.xml. So this issue will eventually be fixed with: {code} diff --git pom.xml pom.xml index ca34309..e1f0cfd 100644 --- pom.xml +++ pom.xml @@ -86,7 +86,7 @@ 6.5.7 0.9.33 1.7.5 -1.8.0 +1.8.2 11.0 1.9.5 {code} > Parquet AVRO tests fail > --- > > Key: PARQUET-725 > URL: https://issues.apache.org/jira/browse/PARQUET-725 > Project: Parquet > Issue Type: Bug >Reporter: Niels Basjes >Assignee: Niels Basjes > > I found that on my machine some of the tests in the parquet-avro fail. > {code} > Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec > Running org.apache.parquet.avro.TestAvroDataSupplier > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec > Running org.apache.parquet.avro.TestReadWrite > Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestBackwardCompatibility > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec > Running org.apache.parquet.avro.TestReadWriteOldListBehavior > Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestInputOutputFormat > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec > Running org.apache.parquet.avro.TestReflectLogicalTypes > Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec > Running org.apache.parquet.avro.TestCircularReferences > Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec > Results : > Failed tests: > testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): > expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, > "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": > "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": > {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], > "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but > was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, > "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": > "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, > 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", > "c"]}> > testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): > Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, > {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": > {"bytes": ""}}]> > testAll[0](org.apache.parquet.avro.TestReadWrite): > expected:but > was: > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > {code} > I see two classes of problems: > # The json with byte arrays appear different. > # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers > that both contain the SAME bytes these tests fail simply because the position > field of the ByteBuffer is different. I think these should compare the > contents of the ByteBuffer instead. > {code} > but >
[jira] [Assigned] (PARQUET-725) Parquet AVRO tests fail
[ https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niels Basjes reassigned PARQUET-725: Assignee: Niels Basjes > Parquet AVRO tests fail > --- > > Key: PARQUET-725 > URL: https://issues.apache.org/jira/browse/PARQUET-725 > Project: Parquet > Issue Type: Bug >Reporter: Niels Basjes >Assignee: Niels Basjes > > I found that on my machine some of the tests in the parquet-avro fail. > {code} > Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec > Running org.apache.parquet.avro.TestAvroDataSupplier > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec > Running org.apache.parquet.avro.TestReadWrite > Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestBackwardCompatibility > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec > Running org.apache.parquet.avro.TestReadWriteOldListBehavior > Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestInputOutputFormat > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec > Running org.apache.parquet.avro.TestReflectLogicalTypes > Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec > Running org.apache.parquet.avro.TestCircularReferences > Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec > Results : > Failed tests: > testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): > expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, > "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": > "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": > {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], > "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but > was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, > "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": > "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, > 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", > "c"]}> > testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): > Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, > {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": > {"bytes": ""}}]> > testAll[0](org.apache.parquet.avro.TestReadWrite): > expected:but > was: > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > {code} > I see two classes of problems: > # The json with byte arrays appear different. > # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers > that both contain the SAME bytes these tests fail simply because the position > field of the ByteBuffer is different. I think these should compare the > contents of the ByteBuffer instead. > {code} > but > was: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-727) Ensure correct version of thrift is used
[ https://issues.apache.org/jira/browse/PARQUET-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522718#comment-15522718 ] Niels Basjes commented on PARQUET-727: -- [~julienledem] [~rdblue] Can you guys please add me as a contributor in Jira so I can assign Parquet issues to myself? Thanks. > Ensure correct version of thrift is used > > > Key: PARQUET-727 > URL: https://issues.apache.org/jira/browse/PARQUET-727 > Project: Parquet > Issue Type: Improvement >Reporter: Niels Basjes > > I found that if you have the wrong version of thrift in your path during the > build the errors you get are very obscure and verbose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-725) Parquet AVRO tests fail
[ https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516632#comment-15516632 ] Niels Basjes commented on PARQUET-725: -- So the root cause is AVRO-1799 which surfaced after changing the logging setup in PARQUET-423. For now I pushed a default log4j.properties that is set to INFO logging. This passed the build. Set if to DEBUG and the build will fail. The action that remains for this issue is to upgrade to AVRO 1.8.2 (as soon as it is released). aIf you don't then any logging of the AVRO related data is likely to be modified by the logging itself. > Parquet AVRO tests fail > --- > > Key: PARQUET-725 > URL: https://issues.apache.org/jira/browse/PARQUET-725 > Project: Parquet > Issue Type: Bug >Reporter: Niels Basjes > > I found that on my machine some of the tests in the parquet-avro fail. > {code} > Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec > Running org.apache.parquet.avro.TestAvroDataSupplier > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec > Running org.apache.parquet.avro.TestReadWrite > Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestBackwardCompatibility > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec > Running org.apache.parquet.avro.TestReadWriteOldListBehavior > Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestInputOutputFormat > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec > Running org.apache.parquet.avro.TestReflectLogicalTypes > Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec > Running org.apache.parquet.avro.TestCircularReferences > Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec > Results : > Failed tests: > testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): > expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, > "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": > "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": > {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], > "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but > was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, > "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": > "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, > 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", > "c"]}> > testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): > Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, > {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": > {"bytes": ""}}]> > testAll[0](org.apache.parquet.avro.TestReadWrite): > expected:but > was: > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > {code} > I see two classes of problems: > # The json with byte arrays appear different. > # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers > that both contain the SAME bytes these tests fail simply because the position > field of the ByteBuffer is different. I think these should compare the > contents of the ByteBuffer instead. > {code} >
[jira] [Comment Edited] (PARQUET-725) Parquet AVRO tests fail
[ https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516512#comment-15516512 ] Niels Basjes edited comment on PARQUET-725 at 9/23/16 2:00 PM: --- Found the root cause (fixed in a yet to be released version of AVRO): AVRO-1799: java: GenericData.toString() mutates underlying ByteBuffer backed data This also is the reason this problem did not occur in my IDE (IntelliJ). The debugger underlying does a 'toString' to show the record on the screen during debugging. Because this was done on both the 'equals' a step later would now succeed, while when running it would make it fail. was (Author: nielsbasjes): Found the propable root cause (fixed in a yet to be released version of AVRO): AVRO-1799: java: GenericData.toString() mutates underlying ByteBuffer backed data > Parquet AVRO tests fail > --- > > Key: PARQUET-725 > URL: https://issues.apache.org/jira/browse/PARQUET-725 > Project: Parquet > Issue Type: Bug >Reporter: Niels Basjes > > I found that on my machine some of the tests in the parquet-avro fail. > {code} > Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec > Running org.apache.parquet.avro.TestAvroDataSupplier > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec > Running org.apache.parquet.avro.TestReadWrite > Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestBackwardCompatibility > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec > Running org.apache.parquet.avro.TestReadWriteOldListBehavior > Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestInputOutputFormat > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec > Running org.apache.parquet.avro.TestReflectLogicalTypes > Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec > Running org.apache.parquet.avro.TestCircularReferences > Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec > Results : > Failed tests: > testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): > expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, > "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": > "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": > {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], > "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but > was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, > "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": > "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, > 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", > "c"]}> > testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): > Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, > {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": > {"bytes": ""}}]> > testAll[0](org.apache.parquet.avro.TestReadWrite): > expected:but > was: > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > {code} > I see two classes of problems: > # The json with byte arrays appear different. > # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers
[jira] [Commented] (PARQUET-725) Parquet AVRO tests fail
[ https://issues.apache.org/jira/browse/PARQUET-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516512#comment-15516512 ] Niels Basjes commented on PARQUET-725: -- Found the propable root cause (fixed in a yet to be released version of AVRO): AVRO-1799: java: GenericData.toString() mutates underlying ByteBuffer backed data > Parquet AVRO tests fail > --- > > Key: PARQUET-725 > URL: https://issues.apache.org/jira/browse/PARQUET-725 > Project: Parquet > Issue Type: Bug >Reporter: Niels Basjes > > I found that on my machine some of the tests in the parquet-avro fail. > {code} > Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec > Running org.apache.parquet.avro.TestAvroDataSupplier > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec > Running org.apache.parquet.avro.TestReadWrite > Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestBackwardCompatibility > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec > Running org.apache.parquet.avro.TestReadWriteOldListBehavior > Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec > <<< FAILURE! > Running org.apache.parquet.avro.TestInputOutputFormat > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec > Running org.apache.parquet.avro.TestReflectLogicalTypes > Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec > Running org.apache.parquet.avro.TestCircularReferences > Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec > Results : > Failed tests: > testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): > expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, > "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": > "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": > {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], > "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but > was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, > "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": > "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, > 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", > "c"]}> > testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): > Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, > {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": > {"bytes": ""}}]> > testAll[0](org.apache.parquet.avro.TestReadWrite): > expected:but > was: > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): > expected: but > was: > testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > > testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): > expected: but > was: > {code} > I see two classes of problems: > # The json with byte arrays appear different. > # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers > that both contain the SAME bytes these tests fail simply because the position > field of the ByteBuffer is different. I think these should compare the > contents of the ByteBuffer instead. > {code} > but > was: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-727) Ensure correct version of thrift is used
Niels Basjes created PARQUET-727: Summary: Ensure correct version of thrift is used Key: PARQUET-727 URL: https://issues.apache.org/jira/browse/PARQUET-727 Project: Parquet Issue Type: Improvement Reporter: Niels Basjes I found that if you have the wrong version of thrift in your path during the build the errors you get are very obscure and verbose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-725) Parquet AVRO tests fail
Niels Basjes created PARQUET-725: Summary: Parquet AVRO tests fail Key: PARQUET-725 URL: https://issues.apache.org/jira/browse/PARQUET-725 Project: Parquet Issue Type: Bug Reporter: Niels Basjes I found that on my machine some of the tests in the parquet-avro fail. {code} Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec Running org.apache.parquet.avro.TestAvroDataSupplier Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec Running org.apache.parquet.avro.TestReadWrite Tests run: 18, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.414 sec <<< FAILURE! Running org.apache.parquet.avro.TestBackwardCompatibility Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec Running org.apache.parquet.avro.TestReadWriteOldListBehavior Tests run: 16, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.148 sec <<< FAILURE! Running org.apache.parquet.avro.TestInputOutputFormat Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.29 sec Running org.apache.parquet.avro.TestReflectLogicalTypes Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec Running org.apache.parquet.avro.TestCircularReferences Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec Results : Failed tests: testWriteReflectReadGeneric(org.apache.parquet.avro.TestReflectReadWrite): expected:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": "\u0001\u0002\u0003\u0004"}, "mystring": "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> but was:<{"myboolean": true, "mybyte": 1, "myshort": 1, "myint": 1, "mylong": 2, "myfloat": 3.1, "mydouble": 4.1, "mybytes": {"bytes": ""}, "mystring": "Hello", "myenum": "A", "mymap": {"a": "1", "b": "2"}, "myshortarray": [1, 2], "myintarray": [1, 2], "mystringarray": ["a", "b"], "mylist": ["a", "b", "c"]}> testWriteDecimalBytes(org.apache.parquet.avro.TestGenericLogicalTypes): Should read BigDecimals as bytes expected:<[{"dec": {"bytes": "ò\u0096"}}, {"dec": {"bytes": "\u²àø"}}]> but was:<[{"dec": {"bytes": ""}}, {"dec": {"bytes": ""}}]> testAll[0](org.apache.parquet.avro.TestReadWrite): expected:but was: testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWrite): expected: but was: testAll[1](org.apache.parquet.avro.TestReadWrite): expected: but was: testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWrite): expected: but was: testAll[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): expected: but was: testAllUsingDefaultAvroSchema[0](org.apache.parquet.avro.TestReadWriteOldListBehavior): expected: but was: testAll[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): expected: but was: testAllUsingDefaultAvroSchema[1](org.apache.parquet.avro.TestReadWriteOldListBehavior): expected: but was: {code} I see two classes of problems: # The json with byte arrays appear different. # Some tests compare the 'toString' of a ByteBuffer. Now for two ByteBuffers that both contain the SAME bytes these tests fail simply because the position field of the ByteBuffer is different. I think these should compare the contents of the ByteBuffer instead. {code} but was: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-722) Building with JDK 8 fails over a maven bug
Niels Basjes created PARQUET-722: Summary: Building with JDK 8 fails over a maven bug Key: PARQUET-722 URL: https://issues.apache.org/jira/browse/PARQUET-722 Project: Parquet Issue Type: Bug Reporter: Niels Basjes When I build parquet on my system I get this error during the build: {quote} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on project parquet-generator: Error rendering velocity resource. NullPointerException -> [Help 1] {quote} About a year ago [~julienledem] responded that this is caused due to a bug in Maven in combination with Java 8: At this page http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure/33360512#33360512 Now this bug has been solved at the Maven end in maven-filtering 1.2 https://issues.apache.org/jira/browse/MSHARED-319 The problem is that this fix has not yet been integrated into the latest available maven versions yet. I'll put up a pull request with a proposed fix for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-423) Make writing Avro to Parquet less noisy
Niels Basjes created PARQUET-423: Summary: Make writing Avro to Parquet less noisy Key: PARQUET-423 URL: https://issues.apache.org/jira/browse/PARQUET-423 Project: Parquet Issue Type: Improvement Components: parquet-avro Affects Versions: 1.8.0 Reporter: Niels Basjes Priority: Minor When writing Avro files to disk using the AvroParquetWriter for each column in the file some statistics are written to the Logging system. When writing files based on a large Avro schema often the output of this logging is no longer useful and becomes a hassle. Because the logging level is hardcoded (why?) into the parquet library I would like to introduce a switch that allows to enable/disable this type of logging. {code} Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 90B for [IPAddress] BINARY: 60 values, 26B raw, 47B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 7 entries, 77B raw, 7B comp} Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 102B for [country] BINARY: 60 values, 26B raw, 47B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 7 entries, 119B raw, 7B comp} Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 152B for [windowid] BINARY: 60 values, 33B raw, 51B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 12 entries, 480B raw, 12B comp} Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 77B for [customerId] BINARY: 58 values, 22B raw, 42B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 7 entries, 49B raw, 7B comp} Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 86B for [sessionId] BINARY: 58 values, 28B raw, 43B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 10 entries, 110B raw, 10B comp} Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 93B for [sessionEventNr] INT64: 58 values, 34B raw, 48B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 14 entries, 112B raw, 14B comp} Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 114B for [visitId] BINARY: 58 values, 28B raw, 43B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 10 entries, 250B raw, 10B comp} Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 90B for [visitEventNr] INT64: 58 values, 34B raw, 45B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 11 entries, 88B raw, 11B comp} Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 112B for [timestamp] INT64: 58 values, 50B raw, 66B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 46 entries, 368B raw, 46B comp} Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 85B for [IPAddress] BINARY: 58 values, 22B raw, 42B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 7 entries, 77B raw, 7B comp} Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 97B for [country] BINARY: 58 values, 22B raw, 42B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 7 entries, 119B raw, 7B comp} Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 144B for [windowid] BINARY: 58 values, 28B raw, 43B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 10 entries, 400B raw, 10B comp} {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)