[ 
https://issues.apache.org/jira/browse/HIVE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756630#action_12756630
 ] 

Namit Jain commented on HIVE-837:
---------------------------------

yesterday, i was having a offline conversation with Raghu, and we were thinking 
that this is similar to the concept of buckets that exists currently.
So, do we enhance the tablesample clause to include filenames also, and not 
expose it as a virtual column at all ?

I think it is more intuitive to have filenames in the where clause - maybe, we 
should have some virtual columns for buckets also and leave the current
syntax for buckets as is for backward compatibility.

File pruning is must - so, having the filename as udf might be more difficult. 
The udf filename() will return the same value at compile time.
So, I would prefer virtual columns instead of filenames. 

SELECT * FROM weblogs DATAFILE ('log1.txt', 'log2.txt') WHERE col1='..' and 
col2= ...
would solve the pruning problem since the file names are part of the syntax, 
but how do you propose to select the filename in that case ?

So, I think the original syntax:
select * FROM logs where FILENAME LIKE(WEB1*)
might be easier

> virtual column support (filename) in hive
> -----------------------------------------
>
>                 Key: HIVE-837
>                 URL: https://issues.apache.org/jira/browse/HIVE-837
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>
> Copying from some mails:
> I am dumping files into a hive partion on five minute intervals. I am using 
> LOAD DATA into a partition.
> weblogs
> web1.00
> web1.05
> web1.10
> ...
> web2.00
> web2.05
> web1.10
> ....
> Things that would be useful..
> Select files from the folder with a regex or exact name
> select * FROM logs where FILENAME LIKE(WEB1*)
> select * FROM LOGS WHERE FILENAME=web2.00
> Also it would be nice to be able to select offsets in a file, this would make 
> sense with appends
> select * from logs WHERE FILENAME=web2.00 FROMOFFSET=454644 [TOOFFSET=]
> select  
> substr(filename, 4, 7) as  class_A, 
> substr(filename,  8, 10) as class_B
> count( x ) as cnt
> from FOO
> group by
> substr(filename, 4, 7), 
> substr(filename,  8, 10) ;
> Hive should support virtual columns

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to