[GitHub] orc pull request #41: ORC-58: Move code for reading rows from Reader to RowR...

2016-07-19 Thread omalley
Github user omalley commented on a diff in the pull request:

https://github.com/apache/orc/pull/41#discussion_r71443174
  
--- Diff: c++/src/Reader.cc ---
@@ -1062,37 +1062,36 @@ namespace orc {
 // PASS
   }
 
+  RowReader::~RowReader() {
+// PASS
+  }
+
   static const uint64_t DIRECTORY_SIZE_GUESS = 16 * 1024;
 
-  class ReaderImpl : public Reader {
+  class ReaderImpl;
+
+  class RowReaderImpl : public RowReader {
   private:
 const Timezone& localTimezone;
 
 // inputs
-std::unique_ptr stream;
-ReaderOptions options;
-const uint64_t fileLength;
-const uint64_t postscriptLength;
 std::vector selectedColumns;
-
+std::shared_ptr stream;
--- End diff --

I'd suggest creating a single class that has all of the state for the 
ReaderImpl. Then it can have the postscript, InputStream, etc. and be shared 
between the ReaderImpl and the set of RowReaderImpls that share the same 
ReaderImpl.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] orc issue #41: ORC-58: Move code for reading rows from Reader to RowReader

2016-07-19 Thread omalley
Github user omalley commented on the issue:

https://github.com/apache/orc/pull/41
  
The ReaderOptions needs to split in half too:

* ReaderOptions
  * setTailLocation
  * setErrorStream
  * setSerializedFooter

* RowReaderOptions
  * include
  * range
  * throwOnHive11DecimalOverflow
  * forcedScaleOnHive11Decimal


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] orc pull request #49: ORC-84. Create a separate java tool module

2016-07-19 Thread omalley
GitHub user omalley opened a pull request:

https://github.com/apache/orc/pull/49

ORC-84. Create a separate java tool module

This creates a new java/tools directory and moves the Java ORC tools into 
it.
* An uber jar is much easier to use from the command line
* It reduces the dependencies for orc-core, which no longer needs the JSON 
or CLI libraries.
* It creates a new driver class that we can add additional tools to.
* It pulls the json data printer out of FileDump.java

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/orc orc-84

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/49.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #49


commit a8d1a472fe9a883cf9887a987c0b0b216d874b1a
Author: Owen O'Malley 
Date:   2016-07-19T18:27:39Z

ORC-84. Create a separate java tool module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Creating a tool java module

2016-07-19 Thread Owen O'Malley
Ok, I went ahead and created https://issues.apache.org/jira/browse/ORC-84
to track it.

.. Owen

On Tue, Jul 19, 2016 at 9:59 AM, Prasanth Jayachandran <
j.prasant...@gmail.com> wrote:

> +1
>
> Thanks
> Prasanth
>
>
>
>
> On Tue, Jul 19, 2016 at 9:59 AM -0700, "Matt Burgess" 
> wrote:
>
>
>
>
>
>
>
>
>
>
> +1
>
> On Tue, Jul 19, 2016 at 12:51 PM, Elliot West  wrote:
> > +1
> >
> > We're currently having to maintain a very simple internal project that
> > delegates to FileDump so that we can peer into Orc files more easily
> > because our installed Hive CLI version lacks the more useful FileDump
> > features. This would be a welcome addition.
> >
> > On 19 July 2016 at 17:45, Owen O'Malley  wrote:
> >
> >> Hi all,
> >>I think it would be nice if we made a tool jar to make it easier to
> >> invoke the equivalent of orcfiledump from Hive. In particular, I propose
> >> that we create a java/tools directory and move FileDump and
> JsonFileDump in
> >> to it. We can also make an orc-tools.jar that is an uber jar and can be
> >> invoked easily from the command line.
> >>
> >> Thoughts?
> >>Owen
> >>
>
>
>
>
>
>


Re: Creating a tool java module

2016-07-19 Thread Prasanth Jayachandran
+1

Thanks
Prasanth




On Tue, Jul 19, 2016 at 9:59 AM -0700, "Matt Burgess"  
wrote:










+1

On Tue, Jul 19, 2016 at 12:51 PM, Elliot West  wrote:
> +1
>
> We're currently having to maintain a very simple internal project that
> delegates to FileDump so that we can peer into Orc files more easily
> because our installed Hive CLI version lacks the more useful FileDump
> features. This would be a welcome addition.
>
> On 19 July 2016 at 17:45, Owen O'Malley  wrote:
>
>> Hi all,
>>I think it would be nice if we made a tool jar to make it easier to
>> invoke the equivalent of orcfiledump from Hive. In particular, I propose
>> that we create a java/tools directory and move FileDump and JsonFileDump in
>> to it. We can also make an orc-tools.jar that is an uber jar and can be
>> invoked easily from the command line.
>>
>> Thoughts?
>>Owen
>>







[GitHub] orc pull request #41: ORC-58: Move code for reading rows from Reader to RowR...

2016-07-19 Thread omalley
Github user omalley commented on a diff in the pull request:

https://github.com/apache/orc/pull/41#discussion_r71377966
  
--- Diff: c++/include/orc/Reader.hh ---
@@ -645,7 +645,72 @@ namespace orc {
   };
 
   /**
-   * The interface for reading ORC files.
+   * The interface for reading rows in ORC files.
+   * This is an an abstract class that will subclassed as necessary.
+   */
+  class RowReader {
+  public:
+virtual ~RowReader();
+/**
+ * Get the selected type of the rows in the file. The file's row type
+ * is projected down to just the selected columns. Thus, if the file's
+ * type is struct and the selected
+ * columns are "col0,col2" the selected type would be
+ * struct.
+ * @return the root type
+ */
+virtual const Type& getSelectedType() const = 0;
+
+/**
+ * Get the selected columns of the file.
+ */
+virtual const std::vector getSelectedColumns() const = 0;
+
--- End diff --

You have some trailing spaces. You can probably configure your IDE to 
automatically trim them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Creating a tool java module

2016-07-19 Thread Owen O'Malley
Hi all,
   I think it would be nice if we made a tool jar to make it easier to
invoke the equivalent of orcfiledump from Hive. In particular, I propose
that we create a java/tools directory and move FileDump and JsonFileDump in
to it. We can also make an orc-tools.jar that is an uber jar and can be
invoked easily from the command line.

Thoughts?
   Owen