[ 
https://issues.apache.org/jira/browse/DRILL-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147194#comment-14147194
 ] 

Aditya Kishore commented on DRILL-1414:
---------------------------------------

So I have been thinking about couple of ways to do it.

# Extend {{org.apache.drill.exec.store.sys.PStore}} interface to add two 
additional functions
{code}
  public V getBlob(String key);
  public void putBlob(String key, V value);
{code}
Now these two methods can be used by the consumers to store large amount of 
data, that may not require frequent enumeration and not suitable for storage on 
systems like Zookeeper. A particular PStore implementation could choose to 
store the blob data differently than the primary value, for example, HBase 
PStore provider could store them in a different column family while Zookeeper 
PStore provider can store them on DFS (as this JIRA summary suggests).
The Query Profile, then can be split into two part where small, meta info about 
the query is stored with a {{put()}} while the fragment profiles are stored 
using {{putBlob()}}.
# Alternatively, we could handle this narrowly by just modifying 
{{org.apache.drill.exec.work.foreman.QueryStatus}} to split and store the 
profile meta data separately form individual query profile.

I am inclined to go with approach #1 as it will allow any future consumer to 
reuse it effortlessly. I already have a partial patch, excluding modification 
to the Web UI, that I am currently testing at this moment. If I do not hear any 
concern with the approach #1, I'll post the patch shortly for the review.

> Move profile storage to DFS rather than using PStore
> ----------------------------------------------------
>
>                 Key: DRILL-1414
>                 URL: https://issues.apache.org/jira/browse/DRILL-1414
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Jacques Nadeau
>            Assignee: Aditya Kishore
>             Fix For: 0.6.0
>
>
> PStores were really built for trivial configuration data, not large query 
> profiles.  As such, we should move to using the DFS for storage of query 
> profiles when distributed mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to