It should be a bug of hive. see below hive> set hive.merge.mapfiles=true; hive> explain from netflix insert overwrite table t1 select movie_id insert overwrite table t2 select user_id; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF netflix)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t1)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL movie_id)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t2)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL user_id)))))
STAGE DEPENDENCIES: Stage-2 is a root stage Stage-5 depends on stages: Stage-2 Stage-0 depends on stages: Stage-5 Stage-8 depends on stages: Stage-2 Stage-1 depends on stages: Stage-8 STAGE PLANS: Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: netflix TableScan alias: netflix Select Operator expressions: expr: movie_id type: string outputColumnNames: _col0 File Output Operator compressed: true GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t1 Select Operator expressions: expr: user_id type: string outputColumnNames: _col0 File Output Operator compressed: true GlobalTableId: 2 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t2 Stage: Stage-5 Conditional Operator list of dependent Tasks: Move Operator files: hdfs directory: true destination: hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/794320195/10000 Map Reduce Alias -> Map Operator Tree: hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/570535800/10004 Reduce Output Operator sort order: Map-reduce partition columns: expr: rand() type: double tag: -1 value expressions: expr: foo type: string Reduce Operator Tree: Extract File Output Operator compressed: true GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t1 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t1 Stage: Stage-8 Conditional Operator list of dependent Tasks: Move Operator files: hdfs directory: true destination: hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/794320195/10002 Map Reduce Alias -> Map Operator Tree: hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/570535800/10005 Reduce Output Operator sort order: Map-reduce partition columns: expr: rand() type: double tag: -1 value expressions: expr: bar type: string Reduce Operator Tree: Extract File Output Operator compressed: true GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t2 Stage: Stage-1 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t2 ----------------------------------------------------------------------------- hive> set hive.merge.mapfiles=false; hive> explain from netflix insert overwrite table t1 select movie_id insert overwrite table t2 select user_id; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF netflix)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t1)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL movie_id)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t2)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL user_id))))) STAGE DEPENDENCIES: Stage-2 is a root stage Stage-0 depends on stages: Stage-2 STAGE PLANS: Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: netflix TableScan alias: netflix Select Operator expressions: expr: movie_id type: string outputColumnNames: _col0 File Output Operator compressed: true GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t1 Select Operator expressions: expr: user_id type: string outputColumnNames: _col0 File Output Operator compressed: true GlobalTableId: 2 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t1 >From explanations above, we can see the query under the second condition would only generate one move operator. This is the reason why t2 will be empty. Regards, Min On Wed, Jan 13, 2010 at 5:01 PM, Zheng Shao <zsh...@gmail.com> wrote: > https://issues.apache.org/jira/browse/HIVE-634 > As far as I know there is nobody working on that right now. If you are > interested, we can work together on that. > Let's move the discussion to the JIRA. > > Zheng > > On Tue, Jan 12, 2010 at 3:27 AM, Anty <anty....@gmail.com> wrote: >> >> Thanks Zheng. >> We have used RegexSerDe in some use cases, but the speed is indeed slower, >> so we don't want to use regular expression if not necessary. >> >> yes, we have used RegexSerDe in some use cases. >> I found HIVE-634 is what i need ,allowing for the user to specify field >> delimiter with any format. >> >> INSERT OVERWRITE LOCAL DIRECTORY '/mnt/daily_timelines' >> [ ROW FORMAT DELIMITED | SERDE ... ] >> [ FILE FORMAT ...] >> SELECT * FROM daily_timelines; >> >> Is somebody still working on this feature? >> >> On Tue, Jan 12, 2010 at 2:28 PM, Zheng Shao <zsh...@gmail.com> wrote: >> > Yes we only support one-byte delimiter for performance reasons. >> > >> > You can use the RegexSerDe in the contrib package for any row format >> > that >> > allows a regular expression (including your case "<>"), but the speed >> > will >> > be slower. >> > >> > Zheng >> > >> > On Mon, Jan 11, 2010 at 5:54 PM, Anty <anty....@gmail.com> wrote: >> >> >> >> Thanks Zheng. >> >> It does works. >> >> I have a another question,if the field delimiter is a string ,e.g. >> >> "<>",it looks like the LazySimpleSerDe can't works.Does the >> >> LazySimpleSerDe didn't support string field delimiter,only one byte of >> >> control characters? >> >> >> >> On Tue, Jan 12, 2010 at 3:05 AM, Zheng Shao <zsh...@gmail.com> wrote: >> >> > For your second question, currently we can do it with a little extra >> >> > work: >> >> > 1. Create an external table on the target directory with the field >> >> > delimiter you want; >> >> > 2. Run the query and insert overwrite the target external table. >> >> > >> >> > For the first question we can also do the similar thing (create a >> >> > bunch of external table and then insert), but I think we should fix >> >> > the problem. >> >> > >> >> > Zheng >> >> > >> >> > On Mon, Jan 11, 2010 at 8:31 AM, Anty <anty....@gmail.com> wrote: >> >> >> HI: >> >> >> I came across the same problean, therein is no data.I have one >> >> >> more question,can i specify the field delimiter for the output >> >> >> file,not just the default ctrl-a field delimiter? >> >> >> >> >> >> On Fri, Jan 8, 2010 at 2:23 PM, wd <w...@wdicc.com> wrote: >> >> >>> hi, >> >> >>> >> >> >>> I'v tried use hive svn version, seems this bug still exists. >> >> >>> >> >> >>> svn st -v >> >> >>> >> >> >>> 896805 896744 namit . >> >> >>> 896805 894292 namit eclipse-templates >> >> >>> 896805 894292 namit >> >> >>> eclipse-templates/.classpath >> >> >>> 896805 765509 zshao >> >> >>> eclipse-templates/TestHive.launchtemplate >> >> >>> 896805 765509 zshao >> >> >>> eclipse-templates/TestMTQueries.l >> >> >>> .......... >> >> >>> >> >> >>> svn reversion 896805 ? >> >> >>> >> >> >>> follows is the execute log. >> >> >>> >> >> >>> hive> from >> >> >>> test >> >> >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' >> >> >>> select * >> >> >>> where >> >> >>> a = 1 >> >> >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' >> >> >>> select * >> >> >>> where >> >> >>> a = 3; >> >> >>> Total MapReduce jobs = 1 >> >> >>> Launching Job 1 out of 1 >> >> >>> Number of reduce tasks is set to 0 since there's no reduce operator >> >> >>> Starting Job = job_201001071716_4691, Tracking URL = >> >> >>> http://abc.com:50030/jobdetails.jsp?jobid=job_201001071716_4691 >> >> >>> Kill Command = hadoop job -Dmapred.job.tracker=abc.com:9001 -kill >> >> >>> job_201001071716_4691 >> >> >>> 2010-01-08 14:14:55,442 Stage-2 map = 0%, reduce = 0% >> >> >>> 2010-01-08 14:15:00,643 Stage-2 map = 100%, reduce = 0% >> >> >>> Ended Job = job_201001071716_4691 >> >> >>> Copying data to local directory /home/stefdong/tmp/0 >> >> >>> Copying data to local directory /home/stefdong/tmp/0 >> >> >>> 13 Rows loaded to /home/stefdong/tmp/0 >> >> >>> 9 Rows loaded to /home/stefdong/tmp/1 >> >> >>> OK >> >> >>> Time taken: 9.409 seconds >> >> >>> >> >> >>> >> >> >>> thx. >> >> >>> >> >> >>> 2010/1/6 wd <w...@wdicc.com> >> >> >>>> >> >> >>>> hi, >> >> >>>> >> >> >>>> Single insert can extract data into '/tmp/out/1'.I even can see >> >> >>>> "xxx >> >> >>>> rows >> >> >>>> loaded to '/tmp/out/0', xxx rows loaded to '/tmp/out/1'...etc in >> >> >>>> multi >> >> >>>> inserts, but there is no data in fact. >> >> >>>> >> >> >>>> Havn't try svn revision, will try it today.thx. >> >> >>>> >> >> >>>> 2010/1/5 Zheng Shao <zsh...@gmail.com> >> >> >>>>> >> >> >>>>> Looks like a bug. >> >> >>>>> What is the svn revision of Hive? >> >> >>>>> >> >> >>>>> Did you verify that single insert into '/tmp/out/1' produces >> >> >>>>> non-empty >> >> >>>>> files? >> >> >>>>> >> >> >>>>> Zheng >> >> >>>>> >> >> >>>>> On Tue, Jan 5, 2010 at 12:51 AM, wd <w...@wdicc.com> wrote: >> >> >>>>> > In hive wiki: >> >> >>>>> > >> >> >>>>> > Hive extension (multiple inserts): >> >> >>>>> > FROM from_statement >> >> >>>>> > INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 >> >> >>>>> > >> >> >>>>> > [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 >> >> >>>>> > select_statement2] >> >> >>>>> > ... >> >> >>>>> > >> >> >>>>> > I'm try to use hive multi inserts to extract data from hive to >> >> >>>>> > local >> >> >>>>> > disk. >> >> >>>>> > Follows is the hql >> >> >>>>> > >> >> >>>>> > from test_tbl >> >> >>>>> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/0' select select * >> >> >>>>> > where >> >> >>>>> > id%10=0 >> >> >>>>> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/1' select select * >> >> >>>>> > where >> >> >>>>> > id%10=1 >> >> >>>>> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/2' select select * >> >> >>>>> > where >> >> >>>>> > id%10=2 >> >> >>>>> > >> >> >>>>> > This hql can execute, but only /tmp/out/0 have datafile in it, >> >> >>>>> > other >> >> >>>>> > directories are empty. why this happen? bug? >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> -- >> >> >>>>> Yours, >> >> >>>>> Zheng >> >> >>>> >> >> >>> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Best Regards >> >> >> Anty Rao >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > Yours, >> >> > Zheng >> >> > >> >> >> >> >> >> >> >> -- >> >> Best Regards >> >> Anty Rao >> > >> > >> > >> > -- >> > Yours, >> > Zheng >> > >> >> >> >> -- >> Best Regards >> Anty Rao >> > > > > -- > Yours, > Zheng > -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com