[jira] Commented: (HIVE-467) Scratch data location should be on different filesystems for different types of intermediate data

Raghotham Murthy (JIRA) Tue, 26 May 2009 18:16:10 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713356#action_12713356
 ]


Raghotham Murthy commented on HIVE-467:
---------------------------------------

looks good. will commit once tests pass.

> Scratch data location should be on different filesystems for different types 
> of intermediate data
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-467
>                 URL: https://issues.apache.org/jira/browse/HIVE-467
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: S3/EC2
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-467.3.patch, hive-467.4.patch, hive-467.5.patch, 
> hive-467.patch.1, hive-467.patch.2
>
>
> Currently Hive uses the same scratch directory/path for all sorts of 
> temporary and intermediate data. This is problematic:
> 1. Temporary location for writing out DDL output should just be temp file on 
> local file system. This divorces the dependence of metadata and browsing 
> operations on a functioning hadoop cluster.
> 2. Temporary location of intermediate map-reduce data should be the default 
> file system (which is typically the hdfs instance on the compute cluster)
> 3. Temporary location for data that needs to be 'moved' into tables should be 
> on the same file system as the table's location (table's location may not be 
> same as hdfs instance of processing cluster).
> ie. - local storage, map-reduce intermediate storage and table storage should 
> be distinguished. Without this distinction - using hive on environments like 
> S3/EC2 causes problems. In such an environment - i would like to be able to:
> - do metadata operations without a provisioned hadoop cluster (using data 
> stored in S3 and metastore on local disk)
> - attach to a provisioned hadoop cluster and run queries
> - store data back in tables that are created over s3 file system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-467) Scratch data location should be on different filesystems for different types of intermediate data

Reply via email to