[GitHub] [orc] mystic-lama commented on pull request #1581: ORC-1477: Remove unused imports from Test classes

2023-08-17 Thread via GitHub


mystic-lama commented on PR #1581:
URL: https://github.com/apache/orc/pull/1581#issuecomment-1683238843

   @dongjoon-hyun As of now there are no further unused imports, ran the 
inspection from IntelliJ and verified.
   Perfectly fine refining the title, open to suggestions :) 
   
   I personally feel that we should enforce this on test classes too. I rely a 
lot on test code to understand things within ORC.
   I do not know how much effort is involved right now, however I am happy to 
take a tab at it, one module at a time. Let me create a JIRA for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #1579: ORC-1475: [C++] ConvertColumnReader.TestConvertNumericToStringVariant fails when compiled with unsigned char

2023-08-17 Thread via GitHub


dongjoon-hyun commented on PR #1579:
URL: https://github.com/apache/orc/pull/1579#issuecomment-1683189384

   To @mystic-lama , of course, ORC-1475 is based on your initial report on 
dev@orc. We appreciated your contribution (testing and reporting).
   > @dongjoon-hyun I also found the issue during release validation on M1
   
   What I did here is to reproduce your initial report. :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [orc] deshanxiao commented on a diff in pull request #1588: ORC-1483: [Java] Remove obsolete FileMetadata

2023-08-17 Thread via GitHub


deshanxiao commented on code in PR #1588:
URL: https://github.com/apache/orc/pull/1588#discussion_r1297887834


##
java/core/src/java/org/apache/orc/OrcFile.java:
##
@@ -350,18 +346,6 @@ public KeyProvider getKeyProvider() {
   return keyProvider;
 }
 
-/**
- * @deprecated Use {@link #orcTail(OrcTail)} instead.
- */
-public ReaderOptions fileMetadata(final FileMetadata metadata) {
-  fileMetadata = metadata;
-  return this;
-}
-
-public FileMetadata getFileMetadata() {

Review Comment:
   +1 for @mystic-lama points. Acutally this PR targets 2.0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1587: ORC-1481: [C++] Better error message when TZDB is unavailable

2023-08-17 Thread via GitHub


dongjoon-hyun commented on code in PR #1587:
URL: https://github.com/apache/orc/pull/1587#discussion_r1297877017


##
c++/src/Timezone.cc:
##
@@ -675,6 +675,13 @@ namespace orc {
 if (itr != timezoneCache.end()) {
   return *(itr->second).get();
 }
+if (!std::filesystem::exists(std::filesystem::path(filename))) {
+  std::stringstream ss;
+  ss << "Time zone file " << filename << " does not exist."
+ << " Please install IANA time zone database and set TZDIR env"
+ << " if it is not installed at /usr/share/zoneinfo";

Review Comment:
   Thank you for investigating and making a PR, @wgtmac . 
   
   Just a question: The original issue report (#1577) also mentioned Windows, 
what happens in Windows? Is this an actionable message in Windows?
   > Specifically on Windows, the DEFAULT_TZDIR location is an invalid path, 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1587: ORC-1481: [C++] Better error message when TZDB is unavailable

2023-08-17 Thread via GitHub


dongjoon-hyun commented on code in PR #1587:
URL: https://github.com/apache/orc/pull/1587#discussion_r1297877017


##
c++/src/Timezone.cc:
##
@@ -675,6 +675,13 @@ namespace orc {
 if (itr != timezoneCache.end()) {
   return *(itr->second).get();
 }
+if (!std::filesystem::exists(std::filesystem::path(filename))) {
+  std::stringstream ss;
+  ss << "Time zone file " << filename << " does not exist."
+ << " Please install IANA time zone database and set TZDIR env"
+ << " if it is not installed at /usr/share/zoneinfo";

Review Comment:
   Thank you for investigating and making a PR, @wgtmac . 
   
   Just a question: The original issue report (#1577) also mentioned Windows, 
what happens in Windows?
   > Specifically on Windows, the DEFAULT_TZDIR location is an invalid path, 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1588: ORC-1483: [Java] Remove obsolete FileMetadata

2023-08-17 Thread via GitHub


dongjoon-hyun commented on code in PR #1588:
URL: https://github.com/apache/orc/pull/1588#discussion_r1297874808


##
java/core/src/java/org/apache/orc/OrcFile.java:
##
@@ -350,18 +346,6 @@ public KeyProvider getKeyProvider() {
   return keyProvider;
 }
 
-/**
- * @deprecated Use {@link #orcTail(OrcTail)} instead.
- */
-public ReaderOptions fileMetadata(final FileMetadata metadata) {
-  fileMetadata = metadata;
-  return this;
-}
-
-public FileMetadata getFileMetadata() {

Review Comment:
   As @mystic-lama pointed, Apache ORC community cannot drop the public API 
without a deprecation, @deshanxiao .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (ORC-1478) Add Unit Test for org.apache.orc.impl.DynamicIntArray

2023-08-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/ORC-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved ORC-1478.

Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 1584
[https://github.com/apache/orc/pull/1584]

> Add Unit Test for org.apache.orc.impl.DynamicIntArray
> -
>
> Key: ORC-1478
> URL: https://issues.apache.org/jira/browse/ORC-1478
> Project: ORC
>  Issue Type: Test
>  Components: Java
>Affects Versions: 1.9.0
>Reporter: Ash
>Assignee: Ash
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Add Unit Test case for org.apache.orc.impl.DynamicIntArray



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ORC-1478) Add Unit Test for org.apache.orc.impl.DynamicIntArray

2023-08-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/ORC-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated ORC-1478:
---
Affects Version/s: 2.0.0
   (was: 1.9.0)

> Add Unit Test for org.apache.orc.impl.DynamicIntArray
> -
>
> Key: ORC-1478
> URL: https://issues.apache.org/jira/browse/ORC-1478
> Project: ORC
>  Issue Type: Test
>  Components: Java
>Affects Versions: 2.0.0
>Reporter: Ash
>Assignee: Ash
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Add Unit Test case for org.apache.orc.impl.DynamicIntArray



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [orc] dongjoon-hyun closed pull request #1584: ORC-1478: Adding Unit Tests for org.apache.orc.impl.DynamicIntArray

2023-08-17 Thread via GitHub


dongjoon-hyun closed pull request #1584: ORC-1478: Adding Unit Tests for 
org.apache.orc.impl.DynamicIntArray
URL: https://github.com/apache/orc/pull/1584


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (ORC-1482) RecordReaderImpl.evaluatePredicateProto assumes floating point stats are always present

2023-08-17 Thread Jason Darrell Lowe (Jira)


 [ 
https://issues.apache.org/jira/browse/ORC-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Darrell Lowe updated ORC-1482:

Attachment: part-0-45af54ac-9306-447f-8d4f-cc2f1b1cf61f-c000.snappy.orc

> RecordReaderImpl.evaluatePredicateProto assumes floating point stats are 
> always present
> ---
>
> Key: ORC-1482
> URL: https://issues.apache.org/jira/browse/ORC-1482
> Project: ORC
>  Issue Type: Bug
>Affects Versions: 1.7.4
>Reporter: Jason Darrell Lowe
>Priority: Major
> Attachments: 
> part-0-45af54ac-9306-447f-8d4f-cc2f1b1cf61f-c000.snappy.orc
>
>
> ORC-629 added custom handling of predicate pushdown on doubles, but the code 
> that was added blindly assumes that double statistics were present in the 
> file which may not have been the case.  Here's the snippet of code in 
> question:
> {code:java}
>      } else if (category == TypeDescription.Category.DOUBLE ||
>         category == TypeDescription.Category.FLOAT) {
>       DoubleColumnStatistics dstas = (DoubleColumnStatistics) cs;
> {code}
>  
> Elsewhere in the code, there's a type check on the result of statistics 
> deserialization before casting, but here the type checks are missing.  It 
> appears this should be checking for DoubleColumnStatistics before assuming 
> the cast will succeed, and if the expected statistics type is not present 
> then the predicate should not be evaluated on that column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ORC-1482) RecordReaderImpl.evaluatePredicateProto assumes floating point stats are always present

2023-08-17 Thread Jason Darrell Lowe (Jira)


[ 
https://issues.apache.org/jira/browse/ORC-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755565#comment-17755565
 ] 

Jason Darrell Lowe commented on ORC-1482:
-

> For the official orc writer, if it's a DOUBLE / FLOAT type, I think it must 
> write DoubleColumnStatistics

Yes, the Java ORC writer always writes the statistics.  This error was found 
while trying to read an ORC file generated by another writer, RAPIDS cudf.  See 
the discussion at  [https://github.com/rapidsai/cudf/issues/13793.]

 

While the official writer always generates stats, it's interesting that 
everywhere else in the reader code I looked, it is tolerant of missing 
statistics.  The reader was tolerant of this before ORC-629.  Seems like an 
easy thing to fix, independent of whether the spec is updated to clarify when 
statistics are required.

> RecordReaderImpl.evaluatePredicateProto assumes floating point stats are 
> always present
> ---
>
> Key: ORC-1482
> URL: https://issues.apache.org/jira/browse/ORC-1482
> Project: ORC
>  Issue Type: Bug
>Affects Versions: 1.7.4
>Reporter: Jason Darrell Lowe
>Priority: Major
>
> ORC-629 added custom handling of predicate pushdown on doubles, but the code 
> that was added blindly assumes that double statistics were present in the 
> file which may not have been the case.  Here's the snippet of code in 
> question:
> {code:java}
>      } else if (category == TypeDescription.Category.DOUBLE ||
>         category == TypeDescription.Category.FLOAT) {
>       DoubleColumnStatistics dstas = (DoubleColumnStatistics) cs;
> {code}
>  
> Elsewhere in the code, there's a type check on the result of statistics 
> deserialization before casting, but here the type checks are missing.  It 
> appears this should be checking for DoubleColumnStatistics before assuming 
> the cast will succeed, and if the expected statistics type is not present 
> then the predicate should not be evaluated on that column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [orc] mystic-lama commented on a diff in pull request #1588: ORC-1483: [Java] Remove obsolete FileMetadata

2023-08-17 Thread via GitHub


mystic-lama commented on code in PR #1588:
URL: https://github.com/apache/orc/pull/1588#discussion_r1297189413


##
java/core/src/java/org/apache/orc/OrcFile.java:
##
@@ -350,18 +346,6 @@ public KeyProvider getKeyProvider() {
   return keyProvider;
 }
 
-/**
- * @deprecated Use {@link #orcTail(OrcTail)} instead.
- */
-public ReaderOptions fileMetadata(final FileMetadata metadata) {
-  fileMetadata = metadata;
-  return this;
-}
-
-public FileMetadata getFileMetadata() {

Review Comment:
   Shall we mark this method as deprecated in early releases and then remove in 
2.0?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org