Re: hive.query.string not reflecting the current query
Hi, thank you all for your replies. I switched to using 'hive.io.filter.text' inline with Peters reply. I also applied the filter negotiation mechanism (HiveStoragePredicateHandler) in my storage handler. It works very well (so far) even though the filter negotiation mechanism is a bit limited in the allowed expression. I'll bring up that question in a separate thread. Br, Petter 2013/12/5 Peter Marron peter.mar...@trilliumsoftware.com Hi, Sorry for the late reply. Maybe the property ‘hive.io.filter.expr.serialized’ is something that can help? It works for me, and it certainly works in the case where the query does not result in a Map/Reduce (which is something that I rely on). (If you google you should be able to find out about it.) Regards, *Peter Marron* Senior Developer, Research Development Office: +44 *(0) 118-940-7609* peter.mar...@trilliumsoftware.com Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK https://www.facebook.com/pages/Trillium-Software/109184815778307 https://twitter.com/TrilliumSW http://www.linkedin.com/company/17710 *www.trilliumsoftware.com http://www.trilliumsoftware.com/* Be Certain About Your Data. Be Trillium Certain. *From:* Petter von Dolwitz (Hem) [mailto:petter.von.dolw...@gmail.com] *Sent:* 03 December 2013 12:46 *To:* user@hive.apache.org *Subject:* hive.query.string not reflecting the current query Hi, I use hive 0.11 with a five machine cluster. I am reading the property hive.query.string from a custom RecordReader (used for reading external tables). If I first invoke a query like select * from mytable where mycolumn='myvalue'; I get the correct query string in this property. If I then invoke select * from mytable limit 100; the property hive.query.string still contains the first query. Seems like hive uses local mode for the second query. Don't know if it is related. Anybody knows why the query string is not updated in the second case? Thanks, Petter image001.pngimage002.pngimage004.pngimage003.png
RE: hive.query.string not reflecting the current query
Hi, Sorry for the late reply. Maybe the property 'hive.io.filter.expr.serialized' is something that can help? It works for me, and it certainly works in the case where the query does not result in a Map/Reduce (which is something that I rely on). (If you google you should be able to find out about it.) Regards, Peter Marron Senior Developer, Research Development Office: +44 (0) 118-940-7609 peter.mar...@trilliumsoftware.commailto:peter.mar...@trilliumsoftware.com Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK [cid:image001.png@01CEF1A7.CCFE66A0] [cid:image002.png@01CEF1A7.CCFE66A0]https://www.facebook.com/pages/Trillium-Software/109184815778307 [cid:image003.png@01CEF1A7.CCFE66A0]https://twitter.com/TrilliumSW [cid:image004.png@01CEF1A7.CCFE66A0]http://www.linkedin.com/company/17710 www.trilliumsoftware.comhttp://www.trilliumsoftware.com/ Be Certain About Your Data. Be Trillium Certain. From: Petter von Dolwitz (Hem) [mailto:petter.von.dolw...@gmail.com] Sent: 03 December 2013 12:46 To: user@hive.apache.org Subject: hive.query.string not reflecting the current query Hi, I use hive 0.11 with a five machine cluster. I am reading the property hive.query.string from a custom RecordReader (used for reading external tables). If I first invoke a query like select * from mytable where mycolumn='myvalue'; I get the correct query string in this property. If I then invoke select * from mytable limit 100; the property hive.query.string still contains the first query. Seems like hive uses local mode for the second query. Don't know if it is related. Anybody knows why the query string is not updated in the second case? Thanks, Petter inline: image001.pnginline: image002.pnginline: image003.pnginline: image004.png
hive.query.string not reflecting the current query
Hi, I use hive 0.11 with a five machine cluster. I am reading the property hive.query.string from a custom RecordReader (used for reading external tables). If I first invoke a query like select * from mytable where mycolumn='myvalue'; I get the correct query string in this property. If I then invoke select * from mytable limit 100; the property hive.query.string still contains the first query. Seems like hive uses local mode for the second query. Don't know if it is related. Anybody knows why the query string is not updated in the second case? Thanks, Petter
Re: hive.query.string not reflecting the current query
Hmmm? Maybe it is related to the fact, that a query: select * from mytable limit 100; does not start any MapReduce job. It is starts a reading operation from HDFS (and a communication with MetaStore to know what is the schema and how to parse the data using InputFormat and SerDe). For example, If you run a query that has the same functionality (i.e. to show all content of the table by specifying all columns in SELECT) select column1, column2, ... columnN from mytable limit 100; then a map-only job will be started and maybe (?) hive.query.string will contain this query.. 2013/12/3 Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com Hi, I use hive 0.11 with a five machine cluster. I am reading the property hive.query.string from a custom RecordReader (used for reading external tables). If I first invoke a query like select * from mytable where mycolumn='myvalue'; I get the correct query string in this property. If I then invoke select * from mytable limit 100; the property hive.query.string still contains the first query. Seems like hive uses local mode for the second query. Don't know if it is related. Anybody knows why the query string is not updated in the second case? Thanks, Petter
Re: hive.query.string not reflecting the current query
Yes, it seems related. I think the query string is not refreshed when hive decides to run without a map reduce job. Problem is that I try to interact with the query string to apply an early filter in the record reader. Any other known way to detect that a map reduce job is not spawned so that I can work around this issue? /Petter Den tisdagen den 3:e december 2013 skrev Adam Kawa: Hmmm? Maybe it is related to the fact, that a query: select * from mytable limit 100; does not start any MapReduce job. It is starts a reading operation from HDFS (and a communication with MetaStore to know what is the schema and how to parse the data using InputFormat and SerDe). For example, If you run a query that has the same functionality (i.e. to show all content of the table by specifying all columns in SELECT) select column1, column2, ... columnN from mytable limit 100; then a map-only job will be started and maybe (?) hive.query.string will contain this query.. 2013/12/3 Petter von Dolwitz (Hem) petter.von.dolw...@gmail.comjavascript:_e({}, 'cvml', 'petter.von.dolw...@gmail.com'); Hi, I use hive 0.11 with a five machine cluster. I am reading the property hive.query.string from a custom RecordReader (used for reading external tables). If I first invoke a query like select * from mytable where mycolumn='myvalue'; I get the correct query string in this property. If I then invoke select * from mytable limit 100; the property hive.query.string still contains the first query. Seems like hive uses local mode for the second query. Don't know if it is related. Anybody knows why the query string is not updated in the second case? Thanks, Petter
Re: hive.query.string not reflecting the current query
Looks like a bug. I've booked this on https://issues.apache.org/jira/browse/HIVE-5935. 2013/12/4 Adam Kawa kawa.a...@gmail.com Maybe you can parse the output of EXPLAIN operator applied on your query https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain or look for other configuration property (e.g. saying that number of map and reduce tasks is equal to 0, or something). 2013/12/3 Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com Yes, it seems related. I think the query string is not refreshed when hive decides to run without a map reduce job. Problem is that I try to interact with the query string to apply an early filter in the record reader. Any other known way to detect that a map reduce job is not spawned so that I can work around this issue? /Petter Den tisdagen den 3:e december 2013 skrev Adam Kawa: Hmmm? Maybe it is related to the fact, that a query: select * from mytable limit 100; does not start any MapReduce job. It is starts a reading operation from HDFS (and a communication with MetaStore to know what is the schema and how to parse the data using InputFormat and SerDe). For example, If you run a query that has the same functionality (i.e. to show all content of the table by specifying all columns in SELECT) select column1, column2, ... columnN from mytable limit 100; then a map-only job will be started and maybe (?) hive.query.string will contain this query.. 2013/12/3 Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com Hi, I use hive 0.11 with a five machine cluster. I am reading the property hive.query.string from a custom RecordReader (used for reading external tables). If I first invoke a query like select * from mytable where mycolumn='myvalue'; I get the correct query string in this property. If I then invoke select * from mytable limit 100; the property hive.query.string still contains the first query. Seems like hive uses local mode for the second query. Don't know if it is related. Anybody knows why the query string is not updated in the second case? Thanks, Petter