[ https://issues.apache.org/jira/browse/PIO-38?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523861#comment-15523861 ]
Wojciech Indyk commented on PIO-38: ----------------------------------- Hello [~Ziemin]! Sorry for late response. I would like to have a chance to provide events to PredictionIO using my current place of storing events. As I can see PredictionIO can work with a pair of Elasticsearch+HBase. Therefore to use Elasticsearch as a backend I need to use HBase as an event-store. I don't know PredictionIO so good, so correct me if I'm wrong. I don't want to use HBase, because it enlarges my technology stack and has no benefit in case of training model in batch. Parquet is more suitable to this case, when I append my archive of events once a day, then can use this data (subset) to train a recommendation model without duplication data in HBase. Is it clear enough? > add Apache Parquet as a data source > ----------------------------------- > > Key: PIO-38 > URL: https://issues.apache.org/jira/browse/PIO-38 > Project: PredictionIO > Issue Type: New Feature > Reporter: Wojciech Indyk > Labels: features > > Apache Parquet (https://parquet.apache.org/) is a columnar data store, native > for Apache Spark and very well suited to storing batch data (as an input) for > PredictionIO Engine. > Parquet is very popular to archive clickstream, so it would enable to use > PredictionIO without additional import of data (and duplication) to HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)