[GitHub] orc pull request #154: ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS co...
Github user prasanthj commented on a diff in the pull request: https://github.com/apache/orc/pull/154#discussion_r132805602 --- Diff: java/core/src/java/org/apache/orc/impl/MemoryManagerImpl.java --- @@ -81,6 +81,7 @@ public Thread getOwner() { */ public MemoryManagerImpl(Configuration conf) { double maxLoad = OrcConf.MEMORY_POOL.getDouble(conf); +ROWS_BETWEEN_CHECKS = OrcConf.ROWS_BETWEEN_CHECKS.getLong(conf); --- End diff -- I don't think we want to support too larger interval for this. Having very high value means prolonging the memory check which is bad (flush often as opposed to don't flush and fail). 1 to 1 may be good range. Also please make a note in the description that keeping too low value is for testing only and can cause too early flushes in some cases and generate sub-optimal orc files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] orc pull request #154: ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS co...
Github user ekoifman commented on a diff in the pull request: https://github.com/apache/orc/pull/154#discussion_r132804794 --- Diff: java/core/src/java/org/apache/orc/impl/MemoryManagerImpl.java --- @@ -81,6 +81,7 @@ public Thread getOwner() { */ public MemoryManagerImpl(Configuration conf) { double maxLoad = OrcConf.MEMORY_POOL.getDouble(conf); +ROWS_BETWEEN_CHECKS = OrcConf.ROWS_BETWEEN_CHECKS.getLong(conf); --- End diff -- default is set OrcConf - 5000 what would be reasonable range of valid values? [1,?] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] orc issue #154: ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS configura...
Github user ekoifman commented on the issue: https://github.com/apache/orc/pull/154 @prasanthj could you review please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] orc pull request #154: ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS co...
Github user prasanthj commented on a diff in the pull request: https://github.com/apache/orc/pull/154#discussion_r132804639 --- Diff: java/core/src/java/org/apache/orc/impl/MemoryManagerImpl.java --- @@ -81,6 +81,7 @@ public Thread getOwner() { */ public MemoryManagerImpl(Configuration conf) { double maxLoad = OrcConf.MEMORY_POOL.getDouble(conf); +ROWS_BETWEEN_CHECKS = OrcConf.ROWS_BETWEEN_CHECKS.getLong(conf); --- End diff -- validate values here? and set default value in case of invalid values and maybe log? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] orc pull request #154: ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS co...
GitHub user ekoifman opened a pull request: https://github.com/apache/orc/pull/154 ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS configurable You can merge this pull request into a Git repository by running: $ git pull https://github.com/ekoifman/orc ORC-228 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/orc/pull/154.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #154 commit ac56bacb0f58c5703cb7fb55ef8b70b7a08cd8a3 Author: Eugene KoifmanDate: 2017-08-11T23:58:10Z ORC-228 Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS configurable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [DISCUSS] ORC 2.0
Hi, > My intention is that we can iterate on the UNSTABLE-PRE-2.0 format without > cross-version compatibility. It will only be used for developer testing. Sounds good - I tested Hive can communicate this to ORC correctly. set hive.exec.orc.write.format="UNSTABLE-PRE-2.0"; offers a very loosely coupled connectivity for the new features being tested. Cheers, Gopal
Re: [DISCUSS] ORC 2.0
Ok, I created ORC-229 https://issues.apache.org/jira/browse/ORC-229 so that we'll have a new OrcFile.Version of UNSTABLE-PRE-2.0. If you look at the associated pull request, you can see the comments in the code are pretty clear that users should stay away. I also added a logged warning when the writer uses that version. My intention is that we can iterate on the UNSTABLE-PRE-2.0 format without cross-version compatibility. It will only be used for developer testing. As part of the ORC 2.0 release, we can delete that version and move to a new 2.0 version. Thoughts? .. Owen On Tue, Aug 8, 2017 at 12:13 AM, Gopal Vijayaraghavanwrote: > Hi, > > > > Let me make sure I have the backwards compatibility straight. If a > user > > > switches to ORC 2.0, he could choose to continue writing in older > formats > > > so that his old tools could read it > > > >Yes, exactly. > > To chime in on Owen's point, the development process has a slight wrinkle > in it, which we avoided in the 0.11 -> 0.12 migration due to ORC being > embedded in Hive. > > The feature addition is two-fold - the new features are available only > when a user flips the writer versions. > > There is no feature flag for reader versions, so the readers have to keep > up to date with the writer changes (or just fail for the "blackholed" ones, > with good errors). > > Due to the split between projects, I expect to see a two-step development > cycle, to clean up the integration pathways before the ABI is frozen in 2.0. > > The entire process can be gated on the writer version - during the > development process, there will be an experimental version (1.5?) and a > stable version. > > I have no interest in ever supporting an actual 1.5 version data setup in > ORC, but for the sake of integration testing the 1.5->2.0 writer versions > are extremely useful stepping stones towards a multi-project dependency > like ORC. > > Once the integrations are all complete and the format can be frozen, ORC > 2.0 releases can still disable the default writer version from being > upgraded for another stable release. > > After the ecosystem has had all its upgrades, the default version gets > flipped to 2.0, while the ability to write 0.12 files will still remain as > an option, while all intermediate reader versions will get dropped. > > That's a bit more complicated than being part of Hive and sync'ing > releases, but I think this gives ORC the flexibility to accept > contributions from a wide community, supporting multi-project release > timelines, without leaving the implementation full of reader > implementations for many writer versions. > > Cheers, > Gopal > > >
[GitHub] orc pull request #153: ORC-229. Add an UNSTABLE-PRE-2.0 file format version.
GitHub user omalley opened a pull request: https://github.com/apache/orc/pull/153 ORC-229. Add an UNSTABLE-PRE-2.0 file format version. You can merge this pull request into a Git repository by running: $ git pull https://github.com/omalley/orc orc-229 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/orc/pull/153.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #153 commit 5f2d8adf7db9e1650028352d4853e4762422c816 Author: Owen O'MalleyDate: 2017-08-11T20:23:22Z ORC-229. Add an UNSTABLE-PRE-2.0 file format version. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface
Github user omalley commented on a diff in the pull request: https://github.com/apache/orc/pull/151#discussion_r132734446 --- Diff: c++/include/orc/Reader.hh --- @@ -288,6 +288,17 @@ namespace orc { virtual uint64_t getCompressionSize() const = 0; /** + * Get ID of writer that generated the file. + * Current availiable Orc writers: + * 0 = ORC Java + * 1 = ORC C++ + * 2 = Presto + * @param id out parameter for writer id + * @return true if writer id is availiable, false if otherwise + */ +virtual bool getWriterId(uint32_t & id) const = 0; --- End diff -- No, it won't. Dain from the Presto team was the one that suggested adding the field. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---