[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899249#comment-15899249 ]
Lefty Leverenz commented on HIVE-15121: --------------------------------------- Sergio Peña documented *hive.blobstore.optimizations.enabled* in a new Blobstore section of Hive Configuration Properties: * [Configuration Properties -- Blobstore (i.e. Amazon S3) | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Blobstore(i.e.AmazonS3)] * [Configuration Properties -- hive.blobstore.optimizations.enabled | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.blobstore.optimizations.enabled] Removed the TODOC2.2 label. > Last MR job in Hive should be able to write to a different scratch directory > ---------------------------------------------------------------------------- > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Fix For: 2.2.0 > > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, > HIVE-15121.3.patch, HIVE-15121.patch, HIVE-15121.WIP.1.patch, > HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a multi-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. > The advantage is that any copying of data that needs to be done from the > scratch directory to the final table directory can be done server-side, > within the blobstore. The MoveTask simply renames data from the scratch > directory to the final table location, which should translate to a > server-side COPY request. This way HiveServer2 doesn't have to actually copy > any data, it just tells the blobstore to do all the work. -- This message was sent by Atlassian JIRA (v6.3.15#6346)