thx,yufeng,your solution save the day^_^ 2017-04-29 13:50 GMT+08:00 yu feng <olaptes...@gmail.com>:
> We has faced this problem before with JDBC, this query speed too much time > on fetching result, and the query is running. which will occupy slot of > request pool(when we configured admission control), we have to cancel the > query after a Certain time time such as 5 minutes. > > In our solution, JDBC fetch result with default fetchSize = 50, we try to > set fetchSize = 10000(by call setFetchSize()) and the acceleration is > obvious(in our test, result size is 2000+W and this can speed up from more > than 1 hour to 500 seconds.). > > 2017-04-29 0:39 GMT+08:00 Mostafa Mokhtar <mmokh...@cloudera.com>: > > > Hi Jeszy > > > > > > One option for large data extracts is to create a text table in HDFS then > > use "hdfs dfs -copyToLocal > > <https://hadoop.apache.org/docs/current/hadoop-project- > dist/hadoop-common/ > > FileSystemShell.html#copyToLocal>" > > to extract the text file(s), for large data dumps storing the data in > HDFS > > then extracting it is likely to be faster than impala-shell or JDCB. > > > > When using impala-shell it is best practice to use -B to speedup fetching > > the results. > > > > Below are the results for two experiments, storing the results in an HDFS > > text table and writing the out of HDFS file took 14 seconds compared to > 40 > > seconds when using just impala-shell. > > > > Lineitem table was used, output data size is 390MB which is comparable to > > your use case. > > > > *Write to HDFS then extract* > > > > time (impala-shell -q "drop table if exists tpch_300_parquet.lineitem_3m; > > create table tpch_300_parquet.lineitem_3m stored as textfile as select * > > from tpch_300_parquet.lineitem limit 3000000;" ; hdfs dfs -copyToLocal > > hdfs:// > > vd1315.domain.com:8020/user/hive/warehouse/tpch_300_ > parquet.db/lineitem_3m > > .; du -sh lineitem_3m ) > > Starting Impala Shell without Kerberos authentication > > Connected to server1.domain.com:21000 > > Server version: impalad version 2.9.0-SNAPSHOT RELEASE (build > > dc25acd558e9388290503e0e1100baac2b3c800b) > > Query: drop table if exists tpch_300_parquet.lineitem_3m > > Query: create table tpch_300_parquet.lineitem_3m stored as textfile as > > select * from tpch_300_parquet.lineitem limit 3000000 > > Query submitted at: 2017-04-28 09:26:23 (Coordinator: > > http://server1.domain.com:25000) > > Query progress can be monitored at: > > http://server1.domain.com:25000/query_plan?query_id=a24f86518c3de1ce: > > 2f35ef9b00000000 > > +-------------------------+ > > | summary | > > +-------------------------+ > > | Inserted 3000000 row(s) | > > +-------------------------+ > > Fetched 1 row(s) in 11.23s > > 390M lineitem_3m > > > > *real 0m14.392s* > > user 0m3.503s > > sys 0m0.655s > > > > > > *Extract results using impala-shell* > > > > [mmokhtar@server1 ~]$ time (impala-shell -B -q "select * From > > tpch_300_parquet.lineitem_3m " > lineitem_3m_impala_shell.txt; du -sh > > lineitem_3m_impala_shell.txt; ) > > Starting Impala Shell without Kerberos authentication > > Connected to server1.domain.com:21000 > > Server version: impalad version 2.9.0-SNAPSHOT RELEASE (build > > dc25acd558e9388290503e0e1100baac2b3c800b) > > Query: select * From tpch_300_parquet.lineitem_3m > > Query submitted at: 2017-04-28 09:27:39 (Coordinator: > > http://server1.domain.com:25000) > > Query progress can be monitored at: > > http://server1.domain.com:25000/query_plan?query_id=ee421b6d4a2226d3: > > 8acbb75f00000000 > > Fetched 3000000 row(s) in 38.88s > > 390M lineitem_3m_impala_shell.txt > > > > *real 0m39.152s* > > user 0m26.012s > > sys 0m0.668s > > > > > > On Fri, Apr 28, 2017 at 7:26 AM, Jeszy <jes...@gmail.com> wrote: > > > > > Hey, > > > > > > It looks like all the time is spent waiting for the client to fetch the > > > results: > > > - ClientFetchWaitTimer: 17m31s > > > > > > Try doing: > > > impala-shell -B -q '<query>' > > > > > > HTH > > > > > > 2017-04-28 14:51 GMT+02:00 吴朱华 <ikew...@gmail.com>: > > > > Maybe I just paste some main thing on mail , and congratulation on > IPO > > > > thing. > > > > > > > > Unregister query: 17m42s (17m42s) > > > > > > > > Fetched 317246 row(s) in 1062.84s > > > > Query Runtime Profile: > > > > Query (id=8149e2439f43b15a:f08e570d7fbf1085): > > > > Summary: > > > > Session ID: 35436d1112b79287:9045c79c795858a5 > > > > Session Type: BEESWAX > > > > Start Time: 2017-04-28 11:50:00.292615000 > > > > End Time: 2017-04-28 12:07:43.133484000 > > > > Query Type: QUERY > > > > Query State: FINISHED > > > > Query Status: OK > > > > Impala Version: impalad version 2.5.0-cdh5-INTERNAL RELEASE > (build > > > > 43880282edc04c03c162bbea6fc85b5388e7fdde) > > > > User: impala > > > > Connected User: impala > > > > Delegated User: > > > > Network Address: ::ffff:10.44.10.186:36325 > > > > Default Db: sjzy > > > > Sql Statement: select > > > > MRECID,UNITID,PCQDM,PCQMC,PCXQDM,PCXQMC,DM,H001,H002, > > > H003,H021,H022,H023,H024,H025,H026A,H026B,H026C,H026D,H026E, > > > H026F,H026G,H027,H028,H029,H030,H031,H032,H033,H034,H035, > > > H036,H037A,H037B,H037C,H038,H039,H040,H041,H042,H043A, > > > H043B,H043C,H043D,H043E,H043F,H043G,H043H,H043I,H043J,H043K, > > > H043L,H044A,H044B,H044C,H044D,H044E,H044F,H044G,H044H,H044I, > > > H050,H051,H052,H053,H054,H055,H056,H061,H062,H063,H064,H065, > > > H066,H070,H071,H072,H073,H074,H075,H080,H100,H111,H112,H113, > > > H120,H200,H201,H202,H203,H204,H205,H206,H207,H208,H209,H210, > > > H211,H300,H320,H321,H322,H323,H324,H400,H401,H402,H403,H404, > > > H405,H406,H500,H600,H601,H602,H603,H604,H605,H606,H607,H608, > > > H609,H610,H611,H612,H613,H614,H615,H616,H621A,H621B,H621C, > > > H621D,H621E,H621F,H622A,H622B,H622C,H801,H802,H803,H804,H901,H902,H903 > > > > FROM NP_2017_NP601 WHERE DS_AREACODE LIKE '445281%' > > > > Coordinator: node1.sky.org:22000 > > > > Query Options (non default): > > > > Plan: > > > > ---------------- > > > > Estimated Per-Host Requirements: Memory=4.50GB VCores=1 > > > > > > > > 01:EXCHANGE [UNPARTITIONED] > > > > | hosts=4 per-host-mem=unavailable > > > > | tuple-ids=0 row-size=1.67KB cardinality=1155911 > > > > | > > > > 00:SCAN HDFS [sjzy.np_2017_np601, RANDOM] > > > > partitions=1/1 files=20 size=1.06GB > > > > predicates: DS_AREACODE LIKE '445281%' > > > > table stats: 11559109 rows total > > > > column stats: all > > > > hosts=4 per-host-mem=4.50GB > > > > tuple-ids=0 row-size=1.67KB cardinality=1155911 > > > > ---------------- > > > > Estimated Per-Host Mem: 4831838208 > > > > Estimated Per-Host VCores: 1 > > > > Request Pool: default-pool > > > > ExecSummary: > > > > Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak > > Mem > > > > Est. Peak Mem Detail > > > > ------------------------------------------------------------ > > > ------------------------------------------------- > > > > 01:EXCHANGE 1 32.314ms 32.314ms 317.25K 1.16M > > 0 > > > > -1.00 B UNPARTITIONED > > > > 00:SCAN HDFS 20 1s137ms 1s348ms 317.25K 1.16M > 163.85 > > MB > > > > 4.50 GB sjzy.np_2017_np601 > > > > Planner Timeline: 53.683ms > > > > - Analysis finished: 24.565ms (24.565ms) > > > > - Equivalence classes computed: 26.389ms (1.823ms) > > > > - Single node plan created: 33.607ms (7.218ms) > > > > - Runtime filters computed: 33.684ms (76.568us) > > > > - Distributed plan created: 39.125ms (5.441ms) > > > > - Planning finished: 53.683ms (14.558ms) > > > > Query Timeline: 17m42s > > > > - Start execution: 43.792us (43.792us) > > > > - Planning finished: 60.640ms (60.596ms) > > > > - Ready to start 20 remote fragments: 65.111ms (4.471ms) > > > > - All 20 remote fragments started: 74.572ms (9.461ms) > > > > - Rows available: 744.300ms (669.728ms) > > > > - First row fetched: 790.128ms (45.828ms) > > > > - Unregister query: 17m42s (17m42s) > > > > ImpalaServer: > > > > - ClientFetchWaitTimer: 17m31s > > > > - RowMaterializationTimer: 10s024ms > > > > > > > > 2017-04-28 19:44 GMT+08:00 Jim Apple <jbap...@cloudera.com>: > > > > > > > >> dev@ does not appear to accept attachments. You can upload it > > somewhere > > > >> and > > > >> post a link, though. > > > >> > > > >> On Thu, Apr 27, 2017 at 11:35 PM, 吴朱华 <ikew...@gmail.com> wrote: > > > >> > > > >> > Oops, I just resend it, you know the chinese network^_^ > > > >> > > > > >> > 2017-04-28 14:20 GMT+08:00 Mostafa Mokhtar <mmokh...@cloudera.com > >: > > > >> > > > > >> >> Btw the profile wasn't attached. > > > >> >> Please resend. > > > >> >> > > > >> >> On Thu, Apr 27, 2017 at 11:11 PM, 吴朱华 <ikew...@gmail.com> wrote: > > > >> >> > > > >> >>> Profile is in the attachment, thanks > > > >> >>> > > > >> >>> > > > >> >>> 2017-04-28 13:10 GMT+08:00 Dimitris Tsirogiannis < > > > >> >>> dtsirogian...@cloudera.com>: > > > >> >>> > > > >> >>>> Maybe you also want to post some information about the schema > > (how > > > >> wide > > > >> >>>> your table is, does it use nested types, etc) as well as the > > > profile > > > >> of > > > >> >>>> the > > > >> >>>> slow query. > > > >> >>>> > > > >> >>>> Dimitris > > > >> >>>> > > > >> >>>> On Thu, Apr 27, 2017 at 9:30 PM, 吴朱华 <ikew...@gmail.com> > wrote: > > > >> >>>> > > > >> >>>> > Hi guys: > > > >> >>>> > we can facing a big issue when select * from a big table. > > > >> >>>> > The performance is 17minutes for retrieving 400MB data. Even > > slow > > > >> >>>> under > > > >> >>>> > JDBC situation. > > > >> >>>> > Is there anyway to improve it?^_^ > > > >> >>>> > > > > >> >>>> > > > >> >>> > > > >> >>> > > > >> >> > > > >> > > > > >> > > > > > >