[GitHub] orc pull request #154: ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS co...

2017-08-11 Thread prasanthj
Github user prasanthj commented on a diff in the pull request:

https://github.com/apache/orc/pull/154#discussion_r132805602
  
--- Diff: java/core/src/java/org/apache/orc/impl/MemoryManagerImpl.java ---
@@ -81,6 +81,7 @@ public Thread getOwner() {
*/
   public MemoryManagerImpl(Configuration conf) {
 double maxLoad = OrcConf.MEMORY_POOL.getDouble(conf);
+ROWS_BETWEEN_CHECKS = OrcConf.ROWS_BETWEEN_CHECKS.getLong(conf);
--- End diff --

I don't think we want to support too larger interval for this. Having very 
high value means prolonging the memory check which is bad (flush often as 
opposed to don't flush and fail). 
1 to 1 may be good range. Also please make a note in the description 
that keeping too low value is for testing only and can cause too early flushes 
in some cases and generate sub-optimal orc files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] orc pull request #154: ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS co...

2017-08-11 Thread ekoifman
Github user ekoifman commented on a diff in the pull request:

https://github.com/apache/orc/pull/154#discussion_r132804794
  
--- Diff: java/core/src/java/org/apache/orc/impl/MemoryManagerImpl.java ---
@@ -81,6 +81,7 @@ public Thread getOwner() {
*/
   public MemoryManagerImpl(Configuration conf) {
 double maxLoad = OrcConf.MEMORY_POOL.getDouble(conf);
+ROWS_BETWEEN_CHECKS = OrcConf.ROWS_BETWEEN_CHECKS.getLong(conf);
--- End diff --

default is set OrcConf - 5000
what would be reasonable range of valid values?  [1,?]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] orc issue #154: ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS configura...

2017-08-11 Thread ekoifman
Github user ekoifman commented on the issue:

https://github.com/apache/orc/pull/154
  
@prasanthj  could you review please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] orc pull request #154: ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS co...

2017-08-11 Thread prasanthj
Github user prasanthj commented on a diff in the pull request:

https://github.com/apache/orc/pull/154#discussion_r132804639
  
--- Diff: java/core/src/java/org/apache/orc/impl/MemoryManagerImpl.java ---
@@ -81,6 +81,7 @@ public Thread getOwner() {
*/
   public MemoryManagerImpl(Configuration conf) {
 double maxLoad = OrcConf.MEMORY_POOL.getDouble(conf);
+ROWS_BETWEEN_CHECKS = OrcConf.ROWS_BETWEEN_CHECKS.getLong(conf);
--- End diff --

validate values here? and set default value in case of invalid values and 
maybe log?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] orc pull request #154: ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS co...

2017-08-11 Thread ekoifman
GitHub user ekoifman opened a pull request:

https://github.com/apache/orc/pull/154

ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS configurable



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ekoifman/orc ORC-228

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/154.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #154


commit ac56bacb0f58c5703cb7fb55ef8b70b7a08cd8a3
Author: Eugene Koifman 
Date:   2017-08-11T23:58:10Z

ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS configurable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] ORC 2.0

2017-08-11 Thread Gopal Vijayaraghavan
Hi,

> My intention is that we can iterate on the UNSTABLE-PRE-2.0 format without
> cross-version compatibility. It will only be used for developer testing. 

Sounds good - I tested Hive can communicate this to ORC correctly.

set hive.exec.orc.write.format="UNSTABLE-PRE-2.0";

offers a very loosely coupled connectivity for the new features being tested.

Cheers,
Gopal





Re: [DISCUSS] ORC 2.0

2017-08-11 Thread Owen O'Malley
Ok, I created ORC-229 https://issues.apache.org/jira/browse/ORC-229 so that
we'll have a new OrcFile.Version of UNSTABLE-PRE-2.0. If you look at the
associated pull request, you can see the comments in the code are pretty
clear that users should stay away. I also added a logged warning when the
writer uses that version.

My intention is that we can iterate on the UNSTABLE-PRE-2.0 format without
cross-version compatibility. It will only be used for developer testing. As
part of the ORC 2.0 release, we can delete that version and move to a new
2.0 version.

Thoughts?

.. Owen

On Tue, Aug 8, 2017 at 12:13 AM, Gopal Vijayaraghavan 
wrote:

> Hi,
>
> > > Let me make sure I have the backwards compatibility straight.  If a
> user
> > > switches to ORC 2.0, he could choose to continue writing in older
> formats
> > > so that his old tools could read it
> >
> >Yes, exactly.
>
> To chime in on Owen's point, the development process has a slight wrinkle
> in it, which we avoided in the 0.11 -> 0.12 migration due to ORC being
> embedded in Hive.
>
> The feature addition is two-fold - the new features are available only
> when a user flips the writer versions.
>
> There is no feature flag for reader versions, so the readers have to keep
> up to date with the writer changes (or just fail for the "blackholed" ones,
> with good errors).
>
> Due to the split between projects, I expect to see a two-step development
> cycle, to clean up the integration pathways before the ABI is frozen in 2.0.
>
> The entire process can be gated on the writer version - during the
> development process, there will be an experimental version (1.5?) and a
> stable version.
>
> I have no interest in ever supporting an actual 1.5 version data setup in
> ORC, but for the sake of integration testing the 1.5->2.0 writer versions
> are extremely useful stepping stones towards a multi-project dependency
> like ORC.
>
> Once the integrations are all complete and the format can be frozen, ORC
> 2.0 releases can still disable the default writer version from being
> upgraded for another stable release.
>
> After the ecosystem has had all its upgrades, the default version gets
> flipped to 2.0, while the ability to write 0.12 files will still remain as
> an option, while all intermediate reader versions will get dropped.
>
> That's a bit more complicated than being part of Hive and sync'ing
> releases, but I think this gives ORC the flexibility to accept
> contributions from a wide community, supporting multi-project release
> timelines, without leaving the implementation full of reader
> implementations for many writer versions.
>
> Cheers,
> Gopal
>
>
>


[GitHub] orc pull request #153: ORC-229. Add an UNSTABLE-PRE-2.0 file format version.

2017-08-11 Thread omalley
GitHub user omalley opened a pull request:

https://github.com/apache/orc/pull/153

ORC-229. Add an UNSTABLE-PRE-2.0 file format version.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/orc orc-229

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/153.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #153


commit 5f2d8adf7db9e1650028352d4853e4762422c816
Author: Owen O'Malley 
Date:   2017-08-11T20:23:22Z

ORC-229. Add an UNSTABLE-PRE-2.0 file format version.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

2017-08-11 Thread omalley
Github user omalley commented on a diff in the pull request:

https://github.com/apache/orc/pull/151#discussion_r132734446
  
--- Diff: c++/include/orc/Reader.hh ---
@@ -288,6 +288,17 @@ namespace orc {
 virtual uint64_t getCompressionSize() const = 0;
 
 /**
+ * Get ID of writer that generated the file.
+ * Current availiable Orc writers:
+ * 0 = ORC Java
+ * 1 = ORC C++
+ * 2 = Presto
+ * @param id out parameter for writer id
+ * @return true if writer id is availiable, false if otherwise
+ */
+virtual bool getWriterId(uint32_t & id) const = 0;
--- End diff --

No, it won't. Dain from the Presto team was the one that suggested adding 
the field.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---