[GitHub] [orc] mystic-lama commented on pull request #1581: ORC-1477: Remove unused imports from Test classes
mystic-lama commented on PR #1581: URL: https://github.com/apache/orc/pull/1581#issuecomment-1683238843 @dongjoon-hyun As of now there are no further unused imports, ran the inspection from IntelliJ and verified. Perfectly fine refining the title, open to suggestions :) I personally feel that we should enforce this on test classes too. I rely a lot on test code to understand things within ORC. I do not know how much effort is involved right now, however I am happy to take a tab at it, one module at a time. Let me create a JIRA for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [orc] dongjoon-hyun commented on pull request #1579: ORC-1475: [C++] ConvertColumnReader.TestConvertNumericToStringVariant fails when compiled with unsigned char
dongjoon-hyun commented on PR #1579: URL: https://github.com/apache/orc/pull/1579#issuecomment-1683189384 To @mystic-lama , of course, ORC-1475 is based on your initial report on dev@orc. We appreciated your contribution (testing and reporting). > @dongjoon-hyun I also found the issue during release validation on M1 What I did here is to reproduce your initial report. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [orc] deshanxiao commented on a diff in pull request #1588: ORC-1483: [Java] Remove obsolete FileMetadata
deshanxiao commented on code in PR #1588: URL: https://github.com/apache/orc/pull/1588#discussion_r1297887834 ## java/core/src/java/org/apache/orc/OrcFile.java: ## @@ -350,18 +346,6 @@ public KeyProvider getKeyProvider() { return keyProvider; } -/** - * @deprecated Use {@link #orcTail(OrcTail)} instead. - */ -public ReaderOptions fileMetadata(final FileMetadata metadata) { - fileMetadata = metadata; - return this; -} - -public FileMetadata getFileMetadata() { Review Comment: +1 for @mystic-lama points. Acutally this PR targets 2.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1587: ORC-1481: [C++] Better error message when TZDB is unavailable
dongjoon-hyun commented on code in PR #1587: URL: https://github.com/apache/orc/pull/1587#discussion_r1297877017 ## c++/src/Timezone.cc: ## @@ -675,6 +675,13 @@ namespace orc { if (itr != timezoneCache.end()) { return *(itr->second).get(); } +if (!std::filesystem::exists(std::filesystem::path(filename))) { + std::stringstream ss; + ss << "Time zone file " << filename << " does not exist." + << " Please install IANA time zone database and set TZDIR env" + << " if it is not installed at /usr/share/zoneinfo"; Review Comment: Thank you for investigating and making a PR, @wgtmac . Just a question: The original issue report (#1577) also mentioned Windows, what happens in Windows? Is this an actionable message in Windows? > Specifically on Windows, the DEFAULT_TZDIR location is an invalid path, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1587: ORC-1481: [C++] Better error message when TZDB is unavailable
dongjoon-hyun commented on code in PR #1587: URL: https://github.com/apache/orc/pull/1587#discussion_r1297877017 ## c++/src/Timezone.cc: ## @@ -675,6 +675,13 @@ namespace orc { if (itr != timezoneCache.end()) { return *(itr->second).get(); } +if (!std::filesystem::exists(std::filesystem::path(filename))) { + std::stringstream ss; + ss << "Time zone file " << filename << " does not exist." + << " Please install IANA time zone database and set TZDIR env" + << " if it is not installed at /usr/share/zoneinfo"; Review Comment: Thank you for investigating and making a PR, @wgtmac . Just a question: The original issue report (#1577) also mentioned Windows, what happens in Windows? > Specifically on Windows, the DEFAULT_TZDIR location is an invalid path, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1588: ORC-1483: [Java] Remove obsolete FileMetadata
dongjoon-hyun commented on code in PR #1588: URL: https://github.com/apache/orc/pull/1588#discussion_r1297874808 ## java/core/src/java/org/apache/orc/OrcFile.java: ## @@ -350,18 +346,6 @@ public KeyProvider getKeyProvider() { return keyProvider; } -/** - * @deprecated Use {@link #orcTail(OrcTail)} instead. - */ -public ReaderOptions fileMetadata(final FileMetadata metadata) { - fileMetadata = metadata; - return this; -} - -public FileMetadata getFileMetadata() { Review Comment: As @mystic-lama pointed, Apache ORC community cannot drop the public API without a deprecation, @deshanxiao . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (ORC-1478) Add Unit Test for org.apache.orc.impl.DynamicIntArray
[ https://issues.apache.org/jira/browse/ORC-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved ORC-1478. Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 1584 [https://github.com/apache/orc/pull/1584] > Add Unit Test for org.apache.orc.impl.DynamicIntArray > - > > Key: ORC-1478 > URL: https://issues.apache.org/jira/browse/ORC-1478 > Project: ORC > Issue Type: Test > Components: Java >Affects Versions: 1.9.0 >Reporter: Ash >Assignee: Ash >Priority: Trivial > Fix For: 2.0.0 > > > Add Unit Test case for org.apache.orc.impl.DynamicIntArray -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ORC-1478) Add Unit Test for org.apache.orc.impl.DynamicIntArray
[ https://issues.apache.org/jira/browse/ORC-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated ORC-1478: --- Affects Version/s: 2.0.0 (was: 1.9.0) > Add Unit Test for org.apache.orc.impl.DynamicIntArray > - > > Key: ORC-1478 > URL: https://issues.apache.org/jira/browse/ORC-1478 > Project: ORC > Issue Type: Test > Components: Java >Affects Versions: 2.0.0 >Reporter: Ash >Assignee: Ash >Priority: Trivial > Fix For: 2.0.0 > > > Add Unit Test case for org.apache.orc.impl.DynamicIntArray -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [orc] dongjoon-hyun closed pull request #1584: ORC-1478: Adding Unit Tests for org.apache.orc.impl.DynamicIntArray
dongjoon-hyun closed pull request #1584: ORC-1478: Adding Unit Tests for org.apache.orc.impl.DynamicIntArray URL: https://github.com/apache/orc/pull/1584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (ORC-1482) RecordReaderImpl.evaluatePredicateProto assumes floating point stats are always present
[ https://issues.apache.org/jira/browse/ORC-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Darrell Lowe updated ORC-1482: Attachment: part-0-45af54ac-9306-447f-8d4f-cc2f1b1cf61f-c000.snappy.orc > RecordReaderImpl.evaluatePredicateProto assumes floating point stats are > always present > --- > > Key: ORC-1482 > URL: https://issues.apache.org/jira/browse/ORC-1482 > Project: ORC > Issue Type: Bug >Affects Versions: 1.7.4 >Reporter: Jason Darrell Lowe >Priority: Major > Attachments: > part-0-45af54ac-9306-447f-8d4f-cc2f1b1cf61f-c000.snappy.orc > > > ORC-629 added custom handling of predicate pushdown on doubles, but the code > that was added blindly assumes that double statistics were present in the > file which may not have been the case. Here's the snippet of code in > question: > {code:java} > } else if (category == TypeDescription.Category.DOUBLE || > category == TypeDescription.Category.FLOAT) { > DoubleColumnStatistics dstas = (DoubleColumnStatistics) cs; > {code} > > Elsewhere in the code, there's a type check on the result of statistics > deserialization before casting, but here the type checks are missing. It > appears this should be checking for DoubleColumnStatistics before assuming > the cast will succeed, and if the expected statistics type is not present > then the predicate should not be evaluated on that column. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ORC-1482) RecordReaderImpl.evaluatePredicateProto assumes floating point stats are always present
[ https://issues.apache.org/jira/browse/ORC-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755565#comment-17755565 ] Jason Darrell Lowe commented on ORC-1482: - > For the official orc writer, if it's a DOUBLE / FLOAT type, I think it must > write DoubleColumnStatistics Yes, the Java ORC writer always writes the statistics. This error was found while trying to read an ORC file generated by another writer, RAPIDS cudf. See the discussion at [https://github.com/rapidsai/cudf/issues/13793.] While the official writer always generates stats, it's interesting that everywhere else in the reader code I looked, it is tolerant of missing statistics. The reader was tolerant of this before ORC-629. Seems like an easy thing to fix, independent of whether the spec is updated to clarify when statistics are required. > RecordReaderImpl.evaluatePredicateProto assumes floating point stats are > always present > --- > > Key: ORC-1482 > URL: https://issues.apache.org/jira/browse/ORC-1482 > Project: ORC > Issue Type: Bug >Affects Versions: 1.7.4 >Reporter: Jason Darrell Lowe >Priority: Major > > ORC-629 added custom handling of predicate pushdown on doubles, but the code > that was added blindly assumes that double statistics were present in the > file which may not have been the case. Here's the snippet of code in > question: > {code:java} > } else if (category == TypeDescription.Category.DOUBLE || > category == TypeDescription.Category.FLOAT) { > DoubleColumnStatistics dstas = (DoubleColumnStatistics) cs; > {code} > > Elsewhere in the code, there's a type check on the result of statistics > deserialization before casting, but here the type checks are missing. It > appears this should be checking for DoubleColumnStatistics before assuming > the cast will succeed, and if the expected statistics type is not present > then the predicate should not be evaluated on that column. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [orc] mystic-lama commented on a diff in pull request #1588: ORC-1483: [Java] Remove obsolete FileMetadata
mystic-lama commented on code in PR #1588: URL: https://github.com/apache/orc/pull/1588#discussion_r1297189413 ## java/core/src/java/org/apache/orc/OrcFile.java: ## @@ -350,18 +346,6 @@ public KeyProvider getKeyProvider() { return keyProvider; } -/** - * @deprecated Use {@link #orcTail(OrcTail)} instead. - */ -public ReaderOptions fileMetadata(final FileMetadata metadata) { - fileMetadata = metadata; - return this; -} - -public FileMetadata getFileMetadata() { Review Comment: Shall we mark this method as deprecated in early releases and then remove in 2.0? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org