date:20180318

[jira] [Commented] (PARQUET-1166) [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h

2018-03-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404402#comment-16404402
 ] 

ASF GitHub Bot commented on PARQUET-1166:
-

advancedxy commented on issue #445: [WIP] PARQUET-1166: Add 
GetRecordBatchReader in parquet/arrow/reader
URL: https://github.com/apache/parquet-cpp/pull/445#issuecomment-374107728
 
 
   ping @wesm @xhochy 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h
> -
>
> Key: PARQUET-1166
> URL: https://issues.apache.org/jira/browse/PARQUET-1166
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Reporter: Xianjin YE
>Priority: Major
>
> Hi, I'd like to proposal a new API to better support splittable reading for 
> Parquet File.
> The intent for this API is that we can selective reading RowGroups(normally 
> be contiguous, but can be arbitrary as long as the row_group_idxes are sorted 
> and unique, [1, 3, 5] for example). 
> The proposed API would be something like this:
> {code:java}
> ::arrow::Status GetRecordBatchReader(const std::vector& 
> row_group_indices,
> 
> std::shared_ptr<::arrow::RecordBatchReader>* out);
> 
> ::arrow::Status GetRecordBatchReader(const std::vector& 
> row_group_indices,
> const 
> std::vector& column_indices,
> 
> std::shared_ptr<::arrow::RecordBatchReader>* out);
> {code}
> With new API, we can split Parquet file into RowGroups and can be processed 
> by multiple tasks(maybe be on different hosts, like the Map task in MapReduce)
> [~wesmckinn][~xhochy] What do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PARQUET-1204) [C++] Less verbose logging from thirdparty toolchain

2018-03-18 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated PARQUET-1204:
-
Fix Version/s: (was: cpp-1.4.0)
   cpp-1.5.0

> [C++] Less verbose logging from thirdparty toolchain
> 
>
> Key: PARQUET-1204
> URL: https://issues.apache.org/jira/browse/PARQUET-1204
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Reporter: Wes McKinney
>Priority: Major
> Fix For: cpp-1.5.0
>
>
> Following work in ARROW-2095, ARROW-2096, elsewhere



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PARQUET-1166) [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h

[jira] [Updated] (PARQUET-1204) [C++] Less verbose logging from thirdparty toolchain

2 matches

Site Navigation

Mail list logo

Footer information