[jira] [Commented] (VXQUERY-131) Supporting Hadoop data and cluster management

Vinshul Arora (JIRA) Sat, 07 Mar 2015 00:52:02 -0800

    [ 
https://issues.apache.org/jira/browse/VXQUERY-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351485#comment-14351485
 ]


Vinshul Arora commented on VXQUERY-131:
---------------------------------------

Thanks Preston, That cleared a lot of doubts about the requirements of this 
Idea. So If I get your reply correctly I think we need to do something like 
this to get the code working as required:

Connect the Apache's VXquery directly to the HDFS (Coding a framework in which 
different sections of XML data are correctly taken as input), run the query, 
store the results of that query in distributed cache and after that we can run 
Hadoop's traditional MapReduce job.

Modifications could be done in the writing part of the XML data(When data is 
written in HDFS after the query is executed)  as that part of code is affecting 
the process of parallelism. 

Am i heading in the right direction?

> Supporting Hadoop data and cluster management
> ---------------------------------------------
>
>                 Key: VXQUERY-131
>                 URL: https://issues.apache.org/jira/browse/VXQUERY-131
>             Project: VXQuery
>          Issue Type: Improvement
>            Reporter: Preston Carman
>            Assignee: Preston Carman
>              Labels: gsoc, gsoc2015, hadoop, java, mentor, xml
>
> Many organizations support Hadoop. It would be nice to be able to read data 
> from this source. The project will include creating a strategy (with the 
> mentor's guidance) for reading XML data from HDFS and implementing it. When 
> connecting VXQuery to HDFS, the strategy may need to consider how to read 
> sections of an XML file. 
> In addition, we could use Yarn as our cluster manager. The Apache Hadoop YARN 
> (Yet Another Resource Negotiator) would be a good cluster management tool for 
> VXQuery. If VXQuery can read data from HDFS, then why not also manage the 
> cluster with a tool provided by Hadoop. The solution would replace the 
> current custom python scripts for cluster management.
> Goal
> - Read XML from HDFS
> - Manage the VXQuery cluster with Yarn



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (VXQUERY-131) Supporting Hadoop data and cluster management

Reply via email to