[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-29 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916242#action_12916242
 ] 

He Yongqiang commented on HIVE-1624:


+1 running tests

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624-3.patch, HIVE-1624-4.patch, 
> HIVE-1624-5.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-28 Thread Vaibhav Aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915995#action_12915995
 ] 

Vaibhav Aggarwal commented on HIVE-1624:


Made the change.

Thanks
Vaibhav

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624-3.patch, HIVE-1624-4.patch, 
> HIVE-1624-5.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915942#action_12915942
 ] 

He Yongqiang commented on HIVE-1624:


Mostly look good. 

In your testcase, can you put the new script file in
new Path(System.getProperty("test.data.dir", ".") + "file name") ?

By move fetchFilesNotInLocalFilesystem to SessionState, you can keep 
getScriptProgName() ect in SemanticAnalyer by changing 
fetchFilesNotInLocalFilesystem's arguments to pass in the command etc. I am 
also ok with current way.

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624-3.patch, HIVE-1624-4.patch, 
> HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-28 Thread Vaibhav Aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915926#action_12915926
 ] 

Vaibhav Aggarwal commented on HIVE-1624:


Hi

I have attached a new patch.
I have added unit tests for DosToUnix and removede the logging code.
I still kept the fetchFilesNotInLocalFilesystem in SemanticAnalyzer as I felt 
that it was not shared code and also depended on calls to getScriptProgName(). 
I felt that SessionState calling into SemanticAnalyzer might not be a good idea.

Thanks
Vaibhav

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624-3.patch, HIVE-1624-4.patch, 
> HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-27 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915391#action_12915391
 ] 

He Yongqiang commented on HIVE-1624:


Great. some nitpick, sorry for not posting them in the previous comment. 
1) It seems there is still one logging code "+
getConsole().printInfo("Testing " + value);"
2) Also can you add one junit test for DosToUnix?
3) Do you think it maybe better to move fetchFilesNotInLocalFilesystem to 
SessionState?

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624-3.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-23 Thread Vaibhav Aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914314#action_12914314
 ] 

Vaibhav Aggarwal commented on HIVE-1624:


Hi

I made the changes you had suggested.
Please review the patch.

Thanks
Vaibhav

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624-3.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-23 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914176#action_12914176
 ] 

He Yongqiang commented on HIVE-1624:


>>Should I modify it to be "hdfs:// || s3://" like path?
Yes. That will be a great start. We can add more if needed in future.

Also please make sure if a program, neither hdfs nor s3 , can not be found 
locally, the query should not fail in semantic analyzer. Otherwise, it may 
break a lot of existing queries.

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-23 Thread Vaibhav Aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914149#action_12914149
 ] 

Vaibhav Aggarwal commented on HIVE-1624:


So current patch matches the "://" in file path before 
downloading the file.
Should I modify it to be "hdfs:// || s3://" like path?
The first one is a more generic case to match any file system.
I can implement it either ways.
Please let me know what you prefer.

Thanks
Vaibhav

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913895#action_12913895
 ] 

He Yongqiang commented on HIVE-1624:


For 2, sometimes it is actually a common case. For example, User can use php 
but no need to have php program in local. We can add some simple rule for 
downloading resource files, such as starts with s3 schema in this case.  

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-22 Thread Vaibhav Aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913881#action_12913881
 ] 

Vaibhav Aggarwal commented on HIVE-1624:


I will remove some of the unnecessary log statements.

The patch consists of two parts:

1. It extends the add file/jar functionality to download remote files. I think 
it makes sense as it is as the user is expected to have access to the file from 
the client location. In case that is not true, the patch will fail with an 
IOException which will notify the user of the problem appropriately.

2. It eliminates the need for user to run an extra add file command. I think 
that the current norm in Hive is for the user to execute the 'add file ' 
command to add a resource before using it in the transform function. This patch 
allows user to directly specify a resource instead of doing it in two steps.

The patch does not attempt to address the case when the script exists on a 
remote location not accessible to the client.

Please let me know what you think.

Thanks
Vaibhav

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913878#action_12913878
 ] 

He Yongqiang commented on HIVE-1624:


looks good basically. need to remove some unneeded logging information

one main problem here is to determine when to download file. We can not simply 
try downloading file when can not be found in local. 
Sometimes scripts exist in some remote dir that the hadoop cluster nodes can 
access but the client can not.

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-22 Thread Vaibhav Aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913875#action_12913875
 ] 

Vaibhav Aggarwal commented on HIVE-1624:


Hi

I have attached a new patch.
It uses the add_resource functionality to make the script available to all 
nodes instead of downloading the script on each node.

Thanks
Vaibhav

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-21 Thread Vaibhav Aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913124#action_12913124
 ] 

Vaibhav Aggarwal commented on HIVE-1624:


Thanks for looking at this. I will experiment with the suggested approach today.

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
> Attachments: HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908729#action_12908729
 ] 

He Yongqiang commented on HIVE-1624:


S3 -> client -> cluster maybe better than directly downloading the script from 
S3 to TaskTracker node.
There may be thousands of concurrent downloading request to S3 for downloading 
a script. (I agree that the script can be cached in local machine, but right 
now hive does not do any cache clean up).
S3 -> client -> cluster will be able to use hadoop distributed cache.

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
> Attachments: HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.