[jira] [Created] (ORC-985) ORC branch 1.7 is producing larger files from java writer

2021-09-03 Thread Owen O'Malley (Jira)
Owen O'Malley created ORC-985:
-

 Summary: ORC branch 1.7 is producing larger files from java writer
 Key: ORC-985
 URL: https://issues.apache.org/jira/browse/ORC-985
 Project: ORC
  Issue Type: Bug
  Components: Java
Affects Versions: 1.7.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Running some tests, I noticed a 5% regression in file sizes with branch 1.7 
compared to 1.6. I need to track this down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ORC-984) Create new writer versions for orc 1.7 and 1.8

2021-09-03 Thread Owen O'Malley (Jira)
Owen O'Malley created ORC-984:
-

 Summary: Create new writer versions for orc 1.7 and 1.8
 Key: ORC-984
 URL: https://issues.apache.org/jira/browse/ORC-984
 Project: ORC
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently we can't tell the difference between orc 1.6, 1.7, or 1.8 files. I'd 
like to introduce a pair of new writer versions that distinguish between them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [orc] pavibhai opened a new pull request #896: ORC-983 Lowered the log level for some messages related to filter processing from INFO to DEBUG

2021-09-03 Thread GitBox


pavibhai opened a new pull request #896:
URL: https://github.com/apache/orc/pull/896


   ### What changes were proposed in this pull request?
   Couple of the log statements related to filter processing have been lowered 
from INFO to DEBUG level.
   
   
   ### Why are the changes needed?
   Make the logging less verbose.
   
   ### How was this patch tested?
   Regression testing as there is no functional change in the patch other than 
the log level.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (ORC-983) Lower the log level of some messages related to filter processing

2021-09-03 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-983:
---

 Summary: Lower the log level of some messages related to filter 
processing
 Key: ORC-983
 URL: https://issues.apache.org/jira/browse/ORC-983
 Project: ORC
  Issue Type: Bug
  Components: Java
Affects Versions: 1.7.0, 1.8.0
Reporter: Pavan Lanka
Assignee: Pavan Lanka


There are a couple of log statements as part of filter processing that are INFO 
level, these should be changed to DEBUG level.

 

Status of the `{color:#6a8759}orc.filter.use.selected{color}` and the 
`determination of filter columns`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [UPDATE] Apache ORC 1.7.0 Preparation

2021-09-03 Thread Dongjoon Hyun
Thank you, William.

BTW, I realized that I made a typo in the title.
This thread is for Apache ORC 1.7.0.
Sorry for making you confused.

Here are additional updates.

- Thanks to Yiqun Zhang with ORC-978,
  Apache ORC 1.7.0 snapshot passed Apache Iceberg Integration Test
- Thanks to Pavan Lanka with ORC-980
  `Filter processing respects the case-sensitivity flag` was fixed.

The only blocker-level JIRA issue is C++ issue.

ORC-968: Column names used to build SearchArgument should be full path names

Dongjoon.

On 2021/09/03 00:19:49, William Hyun  wrote: 
> Thank you for the status update. 
> 
> On 2021/09/01 04:46:07, Dongjoon Hyun  wrote: 
> > Hi, All.
> > 
> > Here is 1.7.0 preparation status as of today.
> > 
> > # On-going blocker issues.
> > - ORC-968: Column names used to build SearchArgument
> > should be full path names
> > - ORC-978: Fix NPE in TestFlinkOrcReaderWriter
> > 
> > # Updated umbrella JIRA issues
> > - ORC-744 (LazyIO of non-filter columns) is resolved.
> > - ORC-731 (Improve `Java Tools`) is resolved.
> > - ORC-798 (Add `@since` tag to public interfaces and classes) landed 3
> > patches.
> > - ORC-979 (C++ API QA) landed 4 patches but has ORC-968 as a blocker.
> > 
> > # Other resolved blocker issues
> > - ORC-965 (Fix ZSTD 'Overflow detected' failure) is fixed at both 1.7.0/
> > 1.6.11.
> > 
> > Best,
> > Dongjoon.
> > 
> 


[GitHub] [orc] guiyanakuang commented on pull request #895: ORC-982: Extract checkstyle to a single file, help newcomers check code style

2021-09-03 Thread GitBox


guiyanakuang commented on pull request #895:
URL: https://github.com/apache/orc/pull/895#issuecomment-912636483


   Thank you very much @dongjoon-hyun, thank to review and fix format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [orc] dongjoon-hyun merged pull request #893: ORC-980: Filter processing respects the case-sensitivity flag

2021-09-03 Thread GitBox


dongjoon-hyun merged pull request #893:
URL: https://github.com/apache/orc/pull/893


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [orc] dongjoon-hyun commented on a change in pull request #893: ORC-980: Filter processing respects the case-sensitivity flag

2021-09-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #893:
URL: https://github.com/apache/orc/pull/893#discussion_r701992522



##
File path: java/core/src/test/org/apache/orc/TestRowFilteringIOSkip.java
##
@@ -570,6 +570,29 @@ public void schemaEvolutionLong2StringColumn() throws 
IOException {
 assertEquals(1, rowCount);
   }
 
+  @Test
+  public void readCaseInsensitive() throws IOException {

Review comment:
   Thank you so much!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [orc] dongjoon-hyun merged pull request #895: ORC-982: Extract checkstyle to a single file, help newcomers check code style

2021-09-03 Thread GitBox


dongjoon-hyun merged pull request #895:
URL: https://github.com/apache/orc/pull/895


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [orc] dongjoon-hyun commented on a change in pull request #895: ORC-982: Extract checkstyle to a single file, help newcomers check code style

2021-09-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #895:
URL: https://github.com/apache/orc/pull/895#discussion_r701985809



##
File path: java/checkstyle.xml
##
@@ -0,0 +1,57 @@
+
+
+

[GitHub] [orc] pavibhai commented on a change in pull request #893: ORC-980: Filter processing respects the case-sensitivity flag

2021-09-03 Thread GitBox


pavibhai commented on a change in pull request #893:
URL: https://github.com/apache/orc/pull/893#discussion_r701973669



##
File path: java/core/src/test/org/apache/orc/TestRowFilteringIOSkip.java
##
@@ -570,6 +570,29 @@ public void schemaEvolutionLong2StringColumn() throws 
IOException {
 assertEquals(1, rowCount);
   }
 
+  @Test
+  public void readCaseInsensitive() throws IOException {

Review comment:
   The default value is true for case-sensitivity so all the other tests 
are case-sensitive tests. I added an explicit failure test with 
case-sensitivity when the name is not found.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [orc] pavibhai commented on a change in pull request #893: ORC-980: Filter processing respects the case-sensitivity flag

2021-09-03 Thread GitBox


pavibhai commented on a change in pull request #893:
URL: https://github.com/apache/orc/pull/893#discussion_r701973669



##
File path: java/core/src/test/org/apache/orc/TestRowFilteringIOSkip.java
##
@@ -570,6 +570,29 @@ public void schemaEvolutionLong2StringColumn() throws 
IOException {
 assertEquals(1, rowCount);
   }
 
+  @Test
+  public void readCaseInsensitive() throws IOException {

Review comment:
   The default value is true for case-sensitivity so all the tests are the 
other tests are case-sensitive tests. I added an explicit failure test with 
case-sensitivity when the name is not found.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [orc] pavibhai commented on a change in pull request #893: ORC-980: Filter processing respects the case-sensitivity flag

2021-09-03 Thread GitBox


pavibhai commented on a change in pull request #893:
URL: https://github.com/apache/orc/pull/893#discussion_r701964316



##
File path: java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java
##
@@ -283,6 +283,7 @@ protected RecordReaderImpl(ReaderImpl fileReader,
 Consumer filterCallBack = null;
 BatchFilter filter = FilterFactory.createBatchFilter(options,
  
evolution.getReaderBaseSchema(),
+ 
evolution.isSchemaEvolutionCaseAware,

Review comment:
   Good point, changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [orc] guiyanakuang commented on a change in pull request #895: ORC-982: Extract checkstyle to a single file, help newcomers check code style

2021-09-03 Thread GitBox


guiyanakuang commented on a change in pull request #895:
URL: https://github.com/apache/orc/pull/895#discussion_r701890984



##
File path: java/checkstyle.xml
##
@@ -0,0 +1,57 @@
+
+
+https://checkstyle.org/dtds/configuration_1_2.dtd;>
+
+
+

Review comment:
   Fix in ec7c471.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [orc] guiyanakuang commented on a change in pull request #895: ORC-982: Extract checkstyle to a single file, help newcomers check code style

2021-09-03 Thread GitBox


guiyanakuang commented on a change in pull request #895:
URL: https://github.com/apache/orc/pull/895#discussion_r701823066



##
File path: java/checkstyle.xml
##
@@ -0,0 +1,57 @@
+
+
+https://checkstyle.org/dtds/configuration_1_2.dtd;>
+
+
+

Review comment:
   My IDE's config for xml indentation defaults to 4 spaces. I'll fix this 
later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [orc] dongjoon-hyun commented on a change in pull request #893: ORC-980: Filter processing respects the case-sensitivity flag

2021-09-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #893:
URL: https://github.com/apache/orc/pull/893#discussion_r701812550



##
File path: java/core/src/test/org/apache/orc/TestRowFilteringIOSkip.java
##
@@ -570,6 +570,29 @@ public void schemaEvolutionLong2StringColumn() throws 
IOException {
 assertEquals(1, rowCount);
   }
 
+  @Test
+  public void readCaseInsensitive() throws IOException {

Review comment:
   Do we have a case-sensitive test coverage?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [orc] dongjoon-hyun commented on a change in pull request #893: ORC-980: Filter processing respects the case-sensitivity flag

2021-09-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #893:
URL: https://github.com/apache/orc/pull/893#discussion_r701811596



##
File path: java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java
##
@@ -283,6 +283,7 @@ protected RecordReaderImpl(ReaderImpl fileReader,
 Consumer filterCallBack = null;
 BatchFilter filter = FilterFactory.createBatchFilter(options,
  
evolution.getReaderBaseSchema(),
+ 
evolution.isSchemaEvolutionCaseAware,

Review comment:
   For consistency, could you use the getter function which is added by 
this PR?
   ```
   public boolean isSchemaEvolutionCaseAware()
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [orc] dongjoon-hyun commented on a change in pull request #893: ORC-980: Filter processing respects the case-sensitivity flag

2021-09-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #893:
URL: https://github.com/apache/orc/pull/893#discussion_r701811596



##
File path: java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java
##
@@ -283,6 +283,7 @@ protected RecordReaderImpl(ReaderImpl fileReader,
 Consumer filterCallBack = null;
 BatchFilter filter = FilterFactory.createBatchFilter(options,
  
evolution.getReaderBaseSchema(),
+ 
evolution.isSchemaEvolutionCaseAware,

Review comment:
   For consistency, could you use the getter function?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [orc] dongjoon-hyun commented on a change in pull request #895: ORC-982: Extract checkstyle to a single file, help newcomers check code style

2021-09-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #895:
URL: https://github.com/apache/orc/pull/895#discussion_r701808648



##
File path: java/checkstyle.xml
##
@@ -0,0 +1,57 @@
+
+
+https://checkstyle.org/dtds/configuration_1_2.dtd;>
+
+
+

Review comment:
   Shall we use two-space indentation?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [orc] guiyanakuang opened a new pull request #895: ORC-982: Extract checkstyle to a single file, help newcomers check code style

2021-09-03 Thread GitBox


guiyanakuang opened a new pull request #895:
URL: https://github.com/apache/orc/pull/895


   
   
   ### What changes were proposed in this pull request?
   
   Extract checkstyle to a single file.
   Added tips to coding.md.
   
   ### Why are the changes needed?
   
   [CheckStyle-IDEA](https://plugins.jetbrains.com/plugin/1065-checkstyle-idea) 
plugin is very simple to load this checkstyle.xml. This way you get checkstyle 
errors/warnings already when you are coding.
   
![image](https://user-images.githubusercontent.com/4069905/131971923-a08b9520-2a9d--844f-a5e3e1396e57.png)
   
   ### How was this patch tested?
   
   Pass the CIs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (ORC-982) Extract checkstyle to a single file, help newcomers check code style

2021-09-03 Thread Yiqun Zhang (Jira)
Yiqun Zhang created ORC-982:
---

 Summary: Extract checkstyle to a single file, help newcomers check 
code style
 Key: ORC-982
 URL: https://issues.apache.org/jira/browse/ORC-982
 Project: ORC
  Issue Type: Improvement
  Components: Java
Affects Versions: 1.8.0
Reporter: Yiqun Zhang
 Fix For: 1.8.0
 Attachments: screenshot-1.png

Extract checkstyle to a single file, help newcomers check code style.
 [CheckStyle-IDEA|https://plugins.jetbrains.com/plugin/1065-checkstyle-idea] 
plugin is very simple to load this checkstyle.xml. This way you get checkstyle 
errors/warnings already when you are coding.
  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)