Re: S3 connections

2017-11-08 Thread Mostafa Mokhtar
It should be safe to apply this setting to all machine sizes.
This setting is mostly to workaround S3 connector timeouts failures that
look like the one below.

The default value is too low to reliably run single user queries.

I1227 19:29:41.471863  1490 AmazonHttpClient.java:496] Unable to execute
HTTP request: Timeout waiting for connection from pool
Java exception follows:
com.cloudera.org.apache.http.conn.ConnectionPoolTimeoutException: Timeout
waiting for connection from pool
at com.cloudera.org.apache.http.impl.conn.PoolingClientConnectionManager
.leaseConnection(PoolingClientConnectionManager.java:232)
at com.cloudera.org.apache.http.impl.conn.PoolingClientConnectionManager
$1.getConnection(PoolingClientConnectionManager.java:199)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.cloudera.com.amazonaws.http.conn.ClientConnectionRequestFactory
$Handler.invoke(ClientConnectionRequestFactory.java:70)
at com.cloudera.com.amazonaws.http.conn.$Proxy21.getConnection(Unknown
Source)
at com.cloudera.org.apache.http.impl.client.DefaultRequestDirector.execute(
DefaultRequestDirector.java:456)
at com.cloudera.org.apache.http.impl.client.AbstractHttpClient.execute(
AbstractHttpClient.java:906)
at com.cloudera.org.apache.http.impl.client.AbstractHttpClient.execute(
AbstractHttpClient.java:805)
at com.cloudera.com.amazonaws.http.AmazonHttpClient.executeOneRequest(
AmazonHttpClient.java:728)
at com.cloudera.com.amazonaws.http.AmazonHttpClient.executeHelper(
AmazonHttpClient.java:489)
at com.cloudera.com.amazonaws.http.AmazonHttpClient.execute(
AmazonHttpClient.java:310)
at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.
invoke(AmazonS3Client.java:3785)
at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(
AmazonS3Client.java:1050)
at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(
AmazonS3Client.java:1027)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(
S3AFileSystem.java:913)
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:394)



On Wed, Nov 8, 2017 at 9:12 AM, Jim Apple  wrote:

> http://impala.apache.org/docs/build/html/topics/impala_s3.html
> recommends "Set the safety valve fs.s3a.connection.maximum to 1500 for
> impalad." For best performance, should this be increased for nodes
> with very high CPU, RAM, or bandwidth? Or decreased for less-beefy
> nodes?
>


Re: long codegen time while codegen disabled

2017-11-08 Thread Mostafa Mokhtar
>From the profile codegen is disabled for HDFS_SCAN_NODE (id=8) and not the
entire query.
If you wish to disable codegen run "set disable_codegen=1;" before
executing the query from impala-shell or add it to the connection string if
using JDBC.

   HDFS_SCAN_NODE (id=8):(Total: 2.314ms, non-child: 2.314ms, %
non-child: 100.00%)
  Hdfs split stats (:<# splits>/):
2:1/14.28 KB
  Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0%
  File Formats: PARQUET/SNAPPY:3
  ExecOption: Codegen enabled: 0 out of 1


On a side not I recommend trying out a more recent version of Impala
as a lot has improved since.


On Wed, Nov 8, 2017 at 12:13 AM, chen  wrote:

>  I have a query,tooks a long time on codegen:
>
>   CodeGen:(Total: 32m22s, non-child: 32m22s, % non-child: 100.00%)
>  - CodegenTime: 0ns
>  - CompileTime: 53.143ms
>  - LoadTime: 58.680us
>  - ModuleFileSize: 1.96 MB (2054956)
>  - OptimizationTime: 32m22s
>  - PrepareTime: 157.700ms
>
> but from the profile ,we can see that codegen is diabled for this query:
>
> ExecOption: Codegen enabled: 0 out of 1
>
> attached is the complete profile.
>
>
> can anyone help to firgure out a way to bypass.
>
>
> Chen
>
>


Re: performance issue on big table join

2017-11-01 Thread Mostafa Mokhtar
Attaching the query profile will be most helpful to investigate this issue.

If you can capture the profile from the WebUI on the coordinator node it
would be great.

On Wed, Nov 1, 2017 at 6:22 PM, 俊杰陈  wrote:

> Thanks Hongxu,
>
> Here are configurations on my cluster,  most of them are default values.
> Which item do you think it may impact?
>
> ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0]
> ABORT_ON_ERROR: [0]
> ALLOW_UNSUPPORTED_FORMATS: [0]
> APPX_COUNT_DISTINCT: [0]
> BATCH_SIZE: [0]
> COMPRESSION_CODEC: [NONE]
> DEBUG_ACTION: []
> DEFAULT_ORDER_BY_LIMIT: [-1]
> DISABLE_CACHED_READS: [0]
> DISABLE_CODEGEN: [0]
> DISABLE_OUTERMOST_TOPN: [0]
> DISABLE_ROW_RUNTIME_FILTERING: [0]
> DISABLE_STREAMING_PREAGGREGATIONS: [0]
> DISABLE_UNSAFE_SPILLS: [0]
> ENABLE_EXPR_REWRITES: [1]
> EXEC_SINGLE_NODE_ROWS_THRESHOLD: [100]
> EXPLAIN_LEVEL: [1]
> HBASE_CACHE_BLOCKS: [0]
> HBASE_CACHING: [0]
> MAX_BLOCK_MGR_MEMORY: [0]
> MAX_ERRORS: [100]
> MAX_IO_BUFFERS: [0]
> MAX_NUM_RUNTIME_FILTERS: [10]
> MAX_SCAN_RANGE_LENGTH: [0]
> MEM_LIMIT: [0]
> MT_DOP: [0]
> NUM_NODES: [0]
> NUM_SCANNER_THREADS: [0]
> OPTIMIZE_PARTITION_KEY_SCANS: [0]
> PARQUET_ANNOTATE_STRINGS_UTF8: [0]
> PARQUET_FALLBACK_SCHEMA_RESOLUTION: [0]
> PARQUET_FILE_SIZE: [0]
> PREFETCH_MODE: [1]
> QUERY_TIMEOUT_S: [0]
> REPLICA_PREFERENCE: [0]
> REQUEST_POOL: []
> RESERVATION_REQUEST_TIMEOUT: [0]
> RM_INITIAL_MEM: [0]
> RUNTIME_BLOOM_FILTER_SIZE: [1048576]
> RUNTIME_FILTER_MAX_SIZE: [16777216]
> RUNTIME_FILTER_MIN_SIZE: [1048576]
> RUNTIME_FILTER_MODE: [2]
> RUNTIME_FILTER_WAIT_TIME_MS: [0]
> S3_SKIP_INSERT_STAGING: [1]
> SCAN_NODE_CODEGEN_THRESHOLD: [180]
> SCHEDULE_RANDOM_REPLICA: [0]
> SCRATCH_LIMIT: [-1]
> SEQ_COMPRESSION_MODE: [0]
> STRICT_MODE: [0]
> SUPPORT_START_OVER: [false]
> SYNC_DDL: [0]
> V_CPU_CORES: [0]
>
> 2017-10-31 15:30 GMT+08:00 Hongxu Ma :
>
> > Hi JJ
> > Consider it only takes 3mins on SparkSQL, maybe there are some mistakes
> in
> > query options.
> > Try run "set;" in impala-shell and check all query options, e.g:
> > BATCH_SIZE: [0]
> > DISABLE_CODEGEN: [0]
> > RUNTIME_FILTER_MODE: GLOBAL
> >
> > Just a guess, thanks.
> >
> > 在 27/10/2017 10:25, 俊杰陈 写道:
> > The profile file is damaged. Here is a screenshot for exec summary
> > [cid:ii_j999ymep1_15f5ba563aeabb91]
> > ​
> >
> > 2017-10-27 10:04 GMT+08:00 俊杰陈 mailto:cjj
> > nj...@gmail.com>>:
> > Hi Devs
> >
> > I met a performance issue on big table join. The query takes more than 3
> > hours on Impala and only 3 minutes on Spark SQL on the same 5 nodes
> > cluster. when running query,  the left scanner and exchange node are very
> > slow.  Did I miss some key arguments?
> >
> > you can see profile file in attachment.
> >
> > [cid:ii_j9998pph2_15f5b92f2cf47020]
> > ​
> > --
> > Thanks & Best Regards
> >
> >
> >
> > --
> > Thanks & Best Regards
> >
> >
> > --
> > Regards,
> > Hongxu.
> >
>
>
>
> --
> Thanks & Best Regards
>


Re: performance issue on big table join

2017-10-26 Thread Mostafa Mokhtar
Hi,

Looks like you are joining store_sales with catalog_sales on item_sk, this
kind of join condition is a many to many, which means the output number of
rows will be much larger then input number of rows, not sure if this is
intended.

Also did you run "compute stats [TABLE_NAME]" on both tables?

For a more comprehensive query try TPCDS Q17

select  i_item_id

   ,i_item_desc

   ,s_state

   ,count(ss_quantity) as store_sales_quantitycount

   ,avg(ss_quantity) as store_sales_quantityave

   ,stddev_samp(ss_quantity) as store_sales_quantitystdev

   ,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov

   ,count(sr_return_quantity) as store_returns_quantitycount

   ,avg(sr_return_quantity) as store_returns_quantityave

   ,stddev_samp(sr_return_quantity) as store_returns_quantitystdev

   ,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as
store_returns_quantitycov

   ,count(cs_quantity) as catalog_sales_quantitycount
,avg(cs_quantity) as catalog_sales_quantityave

   ,stddev_samp(cs_quantity) as catalog_sales_quantitystdev

   ,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov

 from store_sales

 ,store_returns

 ,catalog_sales

 ,date_dim d1

 ,date_dim d2

 ,date_dim d3

 ,store

 ,item

 where d1.d_quarter_name = '2000Q1'

   and d1.d_date_sk = ss_sold_date_sk

   and i_item_sk = ss_item_sk

   and s_store_sk = ss_store_sk

   and ss_customer_sk = sr_customer_sk

   and ss_item_sk = sr_item_sk

   and ss_ticket_number = sr_ticket_number

   and sr_returned_date_sk = d2.d_date_sk

   and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3')

   and sr_customer_sk = cs_bill_customer_sk

   and sr_item_sk = cs_item_sk

   and cs_sold_date_sk = d3.d_date_sk

   and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3')

 group by i_item_id

 ,i_item_desc

 ,s_state

 order by i_item_id

 ,i_item_desc

 ,s_state

limit 100;


I recommend moving this kind of discussion on
u...@impala.incubator.apache.org.

On Thu, Oct 26, 2017 at 7:25 PM, 俊杰陈  wrote:

> The profile file is damaged. Here is a screenshot for exec summary
>
> ​
>
> 2017-10-27 10:04 GMT+08:00 俊杰陈 :
>
>> Hi Devs
>>
>> I met a performance issue on big table join. The query takes more than 3
>> hours on Impala and only 3 minutes on Spark SQL on the same 5 nodes
>> cluster. when running query,  the left scanner and exchange node are very
>> slow.  Did I miss some key arguments?
>>
>> you can see profile file in attachment.
>>
>>
>> ​
>> --
>> Thanks & Best Regards
>>
>
>
>
> --
> Thanks & Best Regards
>


Re: Does this mem-tracker.h assertion ring a bell?

2017-10-25 Thread Mostafa Mokhtar
Maybe related to https://issues.apache.org/jira/browse/IMPALA-6099?

On Wed, Oct 25, 2017 at 10:02 AM, Philip Zeyliger 
wrote:

> Hi folks,
>
> I'm debugging some test failures related to an LLVM/AvroCodegen patch I've
> got going on. The failures are in the parallel EE tests, and most of them
> are complaining that Impala is out to lunch. It looks like the following
> assertion is firing, causing an impalad to fail, causing many tests to
> start failing. (I've also got a minidump, but the build was on
> jenkins.impala.io, so I don't think I have the symbols/binaries to use
> it.)
>
> If this sort of thing rings a bell for anyone, please holler!
>
> Obviously I'll work on reproducing this locally to figure it out.
>
> F1025 02:20:43.786911 82485 mem-tracker.h:231] Check failed:
> tracker->consumption_->current_value() >= 0 (-1052615 vs. 0)
> Runtime Filter (Coordinator): Total=-1.00 MB Peak=1.00 MB
> *** Check failure stack trace: ***
> @  0x2f1e11d  google::LogMessage::Fail()
> @  0x2f1f9c2  google::LogMessage::SendToLog()
> @  0x2f1daf7  google::LogMessage::Flush()
> @  0x2f210be  google::LogMessageFatal::~LogMessageFatal()
> @  0x17425fb  impala::MemTracker::Release()
> @  0x1fa7e8b  impala::Coordinator::UpdateFilter()
> @  0x186e3cf  impala::ImpalaServer::UpdateFilter()
> @  0x18d824f  impala::ImpalaInternalService::UpdateFilter()
> @  0x1dda35a
> impala::ImpalaInternalServiceProcessor::process_UpdateFilter()
> @  0x1dd8308
> impala::ImpalaInternalServiceProcessor::dispatchCall()
> @  0x15410ea  apache::thrift::TDispatchProcessor::process()
> @  0x171042b
> apache::thrift::server::TAcceptQueueServer::Task::run()
> @  0x170c307  impala::ThriftThread::RunRunnable()
> @  0x170da13  boost::_mfi::mf2<>::operator()()
> @  0x170d8a9  boost::_bi::list3<>::operator()<>()
> @  0x170d5f5  boost::_bi::bind_t<>::operator()()
> @  0x170d508
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x171bdfc  boost::function0<>::operator()()
> @  0x19f3393  impala::Thread::SuperviseThread()
> @  0x19fbf26  boost::_bi::list4<>::operator()<>()
> @  0x19fbe69  boost::_bi::bind_t<>::operator()()
> @  0x19fbe2c  boost::detail::thread_data<>::run()
> @  0x20a7c9a  thread_proxy
> @ 0x7fe6536186ba  start_thread
> @ 0x7fe65334e3dd  clone
> r.java:81)
>


Re: Broken Link

2017-10-23 Thread Mostafa Mokhtar
This link should work:

cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf

Will try to update the link in http://impala.apache.org/overview.html.

On Mon, Oct 23, 2017 at 7:52 PM, kenneth mcfarland <
kennethpmcfarl...@gmail.com> wrote:

> Hi Impala Crew,
>
> I really wanted to read about the architecture as I'm new, curious, and
> decide to take a swing at IMPALA-5392.
>
> This link on the /overview page is busted:
>
> http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf
>
> Thanks in advance,
>
> Kenny
>


Re: [VOTE] Graduate to a TLP

2017-10-17 Thread Mostafa Mokhtar
+1 

Thanks 
Mostafa

> On Oct 17, 2017, at 7:09 PM, Brock Noland  wrote:
> 
> +1
> 
>> On Tue, Oct 17, 2017 at 9:07 PM, Lars Volker  wrote:
>> +1
>> 
>>> On Oct 17, 2017 19:07, "Jim Apple"  wrote:
>>> 
>>> Following our discussion
>>> https://lists.apache.org/thread.html/2f5db4788aff9b0557354b9106c032
>>> 8a29c1f90c1a74a228163949d2@%3Cdev.impala.apache.org%3E
>>> , I propose that we graduate to a TLP. According to
>>> https://incubator.apache.org/guides/graduation.html#
>>> community_graduation_vote
>>> this is not required, and https://impala.apache.org/bylaws.html does not
>>> say whose votes are "binding" in a graduation vote, so all community
>>> members are welcome to vote.
>>> 
>>> This will remain open 72 hours. I will be notifying general@incubator it
>>> is
>>> occurring.
>>> 
>>> This is my +1.
>>> 


Re: big issue on retrieving 400MB data

2017-04-28 Thread Mostafa Mokhtar
Options (non default):
> > Plan:
> > 
> > Estimated Per-Host Requirements: Memory=4.50GB VCores=1
> >
> > 01:EXCHANGE [UNPARTITIONED]
> > |  hosts=4 per-host-mem=unavailable
> > |  tuple-ids=0 row-size=1.67KB cardinality=1155911
> > |
> > 00:SCAN HDFS [sjzy.np_2017_np601, RANDOM]
> >partitions=1/1 files=20 size=1.06GB
> >predicates: DS_AREACODE LIKE '445281%'
> >table stats: 11559109 rows total
> >column stats: all
> >hosts=4 per-host-mem=4.50GB
> >tuple-ids=0 row-size=1.67KB cardinality=1155911
> > 
> > Estimated Per-Host Mem: 4831838208
> > Estimated Per-Host VCores: 1
> > Request Pool: default-pool
> > ExecSummary:
> > Operator   #Hosts  Avg Time  Max Time#Rows  Est. #Rows   Peak Mem
> >  Est. Peak Mem  Detail
> > 
> -
> > 01:EXCHANGE 1  32.314ms  32.314ms  317.25K   1.16M  0
> >  -1.00 B  UNPARTITIONED
> > 00:SCAN HDFS   20   1s137ms   1s348ms  317.25K   1.16M  163.85 MB
> >  4.50 GB  sjzy.np_2017_np601
> > Planner Timeline: 53.683ms
> >- Analysis finished: 24.565ms (24.565ms)
> >- Equivalence classes computed: 26.389ms (1.823ms)
> >- Single node plan created: 33.607ms (7.218ms)
> >- Runtime filters computed: 33.684ms (76.568us)
> >- Distributed plan created: 39.125ms (5.441ms)
> >- Planning finished: 53.683ms (14.558ms)
> > Query Timeline: 17m42s
> >- Start execution: 43.792us (43.792us)
> >- Planning finished: 60.640ms (60.596ms)
> >- Ready to start 20 remote fragments: 65.111ms (4.471ms)
> >- All 20 remote fragments started: 74.572ms (9.461ms)
> >- Rows available: 744.300ms (669.728ms)
> >- First row fetched: 790.128ms (45.828ms)
> >- Unregister query: 17m42s (17m42s)
> >   ImpalaServer:
> >  - ClientFetchWaitTimer: 17m31s
> >  - RowMaterializationTimer: 10s024ms
> >
> > 2017-04-28 19:44 GMT+08:00 Jim Apple :
> >
> >> dev@ does not appear to accept attachments. You can upload it somewhere
> >> and
> >> post a link, though.
> >>
> >> On Thu, Apr 27, 2017 at 11:35 PM, 吴朱华  wrote:
> >>
> >> > Oops, I just resend it, you know the chinese network^_^
> >> >
> >> > 2017-04-28 14:20 GMT+08:00 Mostafa Mokhtar :
> >> >
> >> >> Btw the profile wasn't attached.
> >> >> Please resend.
> >> >>
> >> >> On Thu, Apr 27, 2017 at 11:11 PM, 吴朱华  wrote:
> >> >>
> >> >>> Profile is in the attachment, thanks
> >> >>>
> >> >>>
> >> >>> 2017-04-28 13:10 GMT+08:00 Dimitris Tsirogiannis <
> >> >>> dtsirogian...@cloudera.com>:
> >> >>>
> >> >>>> Maybe you also want to post some information about the schema (how
> >> wide
> >> >>>> your table is, does it use nested types, etc) as well as the
> profile
> >> of
> >> >>>> the
> >> >>>> slow query.
> >> >>>>
> >> >>>> Dimitris
> >> >>>>
> >> >>>> On Thu, Apr 27, 2017 at 9:30 PM, 吴朱华  wrote:
> >> >>>>
> >> >>>> > Hi guys:
> >> >>>> > we can facing a big issue when select * from a big table.
> >> >>>> > The performance is 17minutes for retrieving 400MB data. Even slow
> >> >>>> under
> >> >>>> > JDBC situation.
> >> >>>> > Is there anyway to improve it?^_^
> >> >>>> >
> >> >>>>
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
>


Scope of abort_on_config_error is too large

2017-04-10 Thread Mostafa Mokhtar
When deploying Impala on hosts without a co-located HDFS Data node Impala
won't start, unless abort_on_config_error=false is passed as a safety valve.

Concern is that abort_on_config_error checks more than just Short circuit
reads.

Does it make sense to move Short circuit read check out of
abort_on_config_error or put it in a separate flag?

fe/src/main/java/org/apache/impala/service/JniFrontend.java
  /**
   * Returns an error string describing all configuration issues. If no
config issues are
   * found, returns an empty string.
   */
  public String checkConfiguration() {
StringBuilder output = new StringBuilder();
output.append(checkLogFilePermission());
output.append(checkFileSystem(CONF));
output.append(checkShortCircuitRead(CONF));
return output.toString();
  }

be/src/service/impala-server.cc
  Status status = exec_env_->frontend()->ValidateSettings();
  if (!status.ok()) {
LOG(ERROR) << status.GetDetail();
if (FLAGS_abort_on_config_error) {
  CLEAN_EXIT_WITH_ERROR(
  "Aborting Impala Server startup due to improper configuration");
}
  }

  status = exec_env->tmp_file_mgr()->Init(exec_env->metrics());
  if (!status.ok()) {
LOG(ERROR) << status.GetDetail();
if (FLAGS_abort_on_config_error) {
  CLEAN_EXIT_WITH_ERROR("Aborting Impala Server startup due to
improperly "
   "configured scratch directories.");
}


"status.cc:114] Query id 0:0 not found" every second in Impalad logs

2016-09-23 Thread Mostafa Mokhtar
Recently I started seeing what appears to be an error message followed by a
broken stack that get printed every second, the message keeps getting
printed even when there is no query running.

This appears to be coming from ImpalaServer::GetRuntimeProfileStr

or ImpalaServer::GetExecSummary


Anyone knows what is going on?

I0923 16:17:31.101012  1709 status.cc:114] Query id 0:0 not found.
@   0x84e789  (unknown)
@   0xac71b4  (unknown)
@   0xaf42ec  (unknown)
@   0xbf9815  (unknown)
@   0xbfac35  (unknown)
@   0xc0e180  (unknown)
@   0xc108fd  (unknown)
@   0xc10f8d  (unknown)
@   0x345aa079d1  (unknown)
@   0x345a6e88fd  (unknown)
I0923 16:17:32.103350  1720 status.cc:114] Query id 0:0 not found.
@   0x84e789  (unknown)
@   0xac71b4  (unknown)
@   0xaf42ec  (unknown)
@   0xbf9815  (unknown)
@   0xbfac35  (unknown)
@   0xc0e180  (unknown)
@   0xc108fd  (unknown)
@   0xc10f8d  (unknown)
@   0x345aa079d1  (unknown)
@   0x345a6e88fd  (unknown)
I0923 16:17:33.102648  1709 status.cc:114] Query id 0:0 not found.
@   0x84e789  (unknown)
@   0xac71b4  (unknown)
@   0xaf42ec  (unknown)
@   0xbf9815  (unknown)
@   0xbfac35  (unknown)
@   0xc0e180  (unknown)
@   0xc108fd  (unknown)
@   0xc10f8d  (unknown)
@   0x345aa079d1  (unknown)
@   0x345a6e88fd  (unknown)
I0923 16:17:34.103737  1720 status.cc:114] Query id 0:0 not found.
@   0x84e789  (unknown)
@   0xac71b4  (unknown)
@   0xaf42ec  (unknown)
@   0xbf9815  (unknown)
@   0xbfac35  (unknown)
@   0xc0e180  (unknown)
@   0xc108fd  (unknown)
@   0xc10f8d  (unknown)
@   0x345aa079d1  (unknown)
@   0x345a6e88fd  (unknown)
I0923 16:17:35.103298  1709 status.cc:114] Query id 0:0 not found.
@   0x84e789  (unknown)
@   0xac71b4  (unknown)
@   0xaf42ec  (unknown)
@   0xbf9815  (unknown)
@   0xbfac35  (unknown)
@   0xc0e180  (unknown)
@   0xc108fd  (unknown)
@   0xc10f8d  (unknown)
@   0x345aa079d1  (unknown)
@   0x345a6e88fd  (unknown)


[Impala-ASF-CR] IMPALA-2932: Extend DistributedPlanner to account for hash table build cost

2016-08-24 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-2932: Extend DistributedPlanner to account for hash 
table build cost
..


Patch Set 3:

I kicked off a TPC-H/DS run

-- 
To view, visit http://gerrit.cloudera.org:8080/4098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) Enable TPC-H workload for Kudu tables

2016-07-20 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: Enable TPC-H workload for Kudu tables
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3633/5/testdata/workloads/tpch/queries/tpch-kudu-q1.test
File testdata/workloads/tpch/queries/tpch-kudu-q1.test:

Line 8:   round(sum(l_extendedprice), 1) as sum_base_price,
Why is the rounding needed?


-- 
To view, visit http://gerrit.cloudera.org:8080/3633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3a5de71fefa92a78970226d8f49ef445d28f9289
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) Enable TPC-H workload for Kudu tables

2016-07-20 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: Enable TPC-H workload for Kudu tables
..


Patch Set 5:

(3 comments)

Why add tpch-kudu-q*.test?
The queries should be identical to the already existing TPC-H queries, correct?

http://gerrit.cloudera.org:8080/#/c/3633/5/testdata/datasets/tpch/tpch_schema_template.sql
File testdata/datasets/tpch/tpch_schema_template.sql:

PS5, Line 46: distribute by hash (l_orderkey, l_partkey, l_suppkey, 
l_linenumber) into 9 buckets
> Kind of ad-hoc. We run the tests in a pseudo-cluster of size 3, so I picked
Please change to 
distribute by hash (l_orderkey) into 9 buckets


PS5, Line 51:   'kudu.key_columns' = 'l_orderkey, l_partkey, l_suppkey, 
l_linenumber'
> I used the official TPC-H spec for that.
Please change to 
'kudu.key_columns' = 'l_shipdate,l_orderkey, l_linenumber'


Line 263:   'kudu.key_columns' = 'o_orderkey'
Please change to 'o_orderdate,o_orderkey'

Will need to change column order for the DDL to work.


-- 
To view, visit http://gerrit.cloudera.org:8080/3633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3a5de71fefa92a78970226d8f49ef445d28f9289
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-2328 Parquet scan should use min/max stats

2016-07-12 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-2328 Parquet scan should use min/max stats
..


Patch Set 1:

I tried the patch and hitting some exceptions 

#0  0x7f8bbb197cc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f8bbb19b0d8 in __GI_abort () at abort.c:89
#2  0x027dd644 in google::DumpStackTraceAndExit() ()
#3  0x027d6add in google::LogMessage::Fail() ()
#4  0x027d9406 in google::LogMessage::SendToLog() ()
#5  0x027d65fd in google::LogMessage::Flush() ()
#6  0x027d9eae in google::LogMessageFatal::~LogMessageFatal() ()
#7  0x01731516 in impala::PartitionedHashJoinNode::NextProbeRowBatch 
(this=0x93e1b00, state=0x8b5b600, out_batch=0xd1e6520)
at 
/home/mmokhtar/workspace/Impala/be/src/exec/partitioned-hash-join-node.cc:755
#8  0x01735a90 in impala::PartitionedHashJoinNode::GetNext 
(this=0x93e1b00, state=0x8b5b600, out_batch=0xd1e6520, eos=0x93e17e1)
at 
/home/mmokhtar/workspace/Impala/be/src/exec/partitioned-hash-join-node.cc:1031
#9  0x01794c3b in impala::BlockingJoinNode::Open (this=0x93e1680, 
state=0x8b5b600) at 
/home/mmokhtar/workspace/Impala/be/src/exec/blocking-join-node.cc:222
#10 0x0172bd24 in impala::PartitionedHashJoinNode::Open 
(this=0x93e1680, state=0x8b5b600) at 
/home/mmokhtar/workspace/Impala/be/src/exec/partitioned-hash-join-node.cc:254
#11 0x01794645 in impala::BlockingJoinNode::Open (this=0x93e1200, 
state=0x8b5b600) at 
/home/mmokhtar/workspace/Impala/be/src/exec/blocking-join-node.cc:195
#12 0x0172bd24 in impala::PartitionedHashJoinNode::Open 
(this=0x93e1200, state=0x8b5b600) at 
/home/mmokhtar/workspace/Impala/be/src/exec/partitioned-hash-join-node.cc:254
#13 0x0170accd in impala::PartitionedAggregationNode::Open 
(this=0x8d64f00, state=0x8b5b600) at 
/home/mmokhtar/workspace/Impala/be/src/exec/partitioned-aggregation-node.cc:299
#14 0x0194e542 in impala::PlanFragmentExecutor::OpenInternal 
(this=0x838a4d8) at 
/home/mmokhtar/workspace/Impala/be/src/runtime/plan-fragment-executor.cc:366

Can you please try running this query 
https://github.com/cloudera/Impala/blob/cdh5-trunk/testdata/workloads/tpch/queries/tpch-q10.test

-- 
To view, visit http://gerrit.cloudera.org:8080/3623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91de1f4d0fb2a982d06cd344e41901e3bf3c2cea
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jian Wu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3817: Ensure filter hash function is the same on all hardware.

2016-07-12 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-3817: Ensure filter hash function is the same on all 
hardware.
..


Patch Set 1: -Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/3566
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia09b67f9e987af3e2c8ac12c347b95a7e09ce6fa
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3817: Ensure filter hash function is the same on all hardware.

2016-07-08 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-3817: Ensure filter hash function is the same on all 
hardware.
..


Patch Set 1: -Code-Review Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3566
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia09b67f9e987af3e2c8ac12c347b95a7e09ce6fa
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3817: Ensure filter hash function is the same on all hardware.

2016-07-08 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-3817: Ensure filter hash function is the same on all 
hardware.
..


Patch Set 1: Code-Review+1 Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3566
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia09b67f9e987af3e2c8ac12c347b95a7e09ce6fa
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3817: Ensure filter hash function is the same on all hardware.

2016-07-08 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-3817: Ensure filter hash function is the same on all 
hardware.
..


Patch Set 1: -Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/3566
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia09b67f9e987af3e2c8ac12c347b95a7e09ce6fa
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3766: Applying LZ4 compression on buffers before spilling

2016-07-06 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-3766:  Applying LZ4 compression on buffers before 
spilling
..


Patch Set 2:

Can you please attach the before and after profiles to IMPALA-3766.

-- 
To view, visit http://gerrit.cloudera.org:8080/3478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4d49bd8d6d7643c84cefd1274c18b52907ca1488
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3735: Add per-fragment information to debug webpage

2016-07-06 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-3735: Add per-fragment information to debug webpage
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3323/4/be/src/service/fragment-mgr.cc
File be/src/service/fragment-mgr.cc:

Line 69: fragments.PushBack(fragment, document->GetAllocator());
Is it possible to add number of completed and assigned scan ranges?


-- 
To view, visit http://gerrit.cloudera.org:8080/3323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1d995da50c3a119b7aaf04d6f87e60e9e573a5ed
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3817: Ensure filter hash function is the same on all hardware.

2016-07-05 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-3817: Ensure filter hash function is the same on all 
hardware.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3566/1//COMMIT_MSG
Commit Message:

Line 12: 16-node cluster showed negligible performance differences.
Can you please run some of the bloom filter benchmarks and rerun?


-- 
To view, visit http://gerrit.cloudera.org:8080/3566
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia09b67f9e987af3e2c8ac12c347b95a7e09ce6fa
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) Use AVX2 operations to speedup Bloom filters by 10-100%.

2016-06-18 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: Use AVX2 operations to speedup Bloom filters by 10-100%.
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3338/7/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

Line 176:   if (CpuInfo::IsSupported(CpuInfo::AVX2)) {
> That codegen improvement isn't actually vapourware, I have a draft patch th
Code-gen aside.
As far as I know it is not recommend to re-check conditions that don't change 
over the lifetime of an object. 

And if use_avx2 is defined as below it should be read from the registers in 
BucketFindAVX2 and BucketInsertAVX2.


private:
  /// log_directory_space_ is the log (base 2) of the number of buckets in the 
directory.
  const int log_num_buckets_;

  const bool use_avx2; 

  /// directory_mask_ is (1 << log_num_buckets_) - 1. It is precomputed for
  /// efficiency reasons.
  const uint32_t directory_mask_;


Line 254:   const bool result = _mm256_testc_si256(bucket, mask);
Did you check that AVX2 provides speedup even for highly selective filters?
As BucketFind can return earlier at first BUCKET_WORD.


-- 
To view, visit http://gerrit.cloudera.org:8080/3338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6fef4f6652876f8fd7e3f0e41431702380418c98
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) Use AVX2 operations to speedup Bloom filters by 10-100%.

2016-06-17 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: Use AVX2 operations to speedup Bloom filters by 10-100%.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3338/7/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

Line 176:   if (CpuInfo::IsSupported(CpuInfo::AVX2)) {
> "const" doesn't make something a compile time constant. It just means that 
No, wasn't referring to a static const, I expect the compiler to hoist the bool 
in a register, also I expect the CMP instruction to be cheaper than MOV+TESTL. 
My preference is to get the code as efficient as possible and not wait for 
code-gen as code-gen work tends to take a lot of time.


-- 
To view, visit http://gerrit.cloudera.org:8080/3338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6fef4f6652876f8fd7e3f0e41431702380418c98
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) Use AVX2 operations to speedup Bloom filters by 10-100%.

2016-06-15 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: Use AVX2 operations to speedup Bloom filters by 10-100%.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3338/7/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

Line 176:   if (CpuInfo::IsSupported(CpuInfo::AVX2)) {
> There should be no cost difference.  IsSupported is just a bit-test which i
So TESTL is cheaper than cmp eax, 1?
Why would the const boolean not get read from the register? can't the compiler 
optimize that?


-- 
To view, visit http://gerrit.cloudera.org:8080/3338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6fef4f6652876f8fd7e3f0e41431702380418c98
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) Use AVX2 operations to speedup Bloom filters by 10-100%.

2016-06-15 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: Use AVX2 operations to speedup Bloom filters by 10-100%.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3338/7/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

Line 176:   if (CpuInfo::IsSupported(CpuInfo::AVX2)) {
> I could call it in the constructor, but given how cheap it is to check, do 
It will save a couple of instructions and ideally the boolean will be read from 
a register. 
Apart from the cost, it is good practice to check the condition once and cache 
opposed to checking for each row.


-- 
To view, visit http://gerrit.cloudera.org:8080/3338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6fef4f6652876f8fd7e3f0e41431702380418c98
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) Use AVX2 operations to speedup Bloom filters by 10-100%.

2016-06-14 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: Use AVX2 operations to speedup Bloom filters by 10-100%.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3338/7/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

Line 176:   if (CpuInfo::IsSupported(CpuInfo::AVX2)) {
Can you call CpuInfo::IsSupported once at Init or BloomFilter() and use a const 
bool instead?


-- 
To view, visit http://gerrit.cloudera.org:8080/3338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6fef4f6652876f8fd7e3f0e41431702380418c98
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-2809: improve ByteSwap with builtin function or SSE or AVX2

2016-05-17 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has abandoned this change.

Change subject: IMPALA-2809: improve ByteSwap with builtin function or SSE or 
AVX2
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/1813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I1fd3d3a040fbc812d285a7f882f9d759950d9a41
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Zuo Wang 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-CR](cdh5-trunk) Renamed conjunct_ordering.test to primitive_conjunct_ordering.test in targeted-perf

2016-05-12 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: Renamed conjunct_ordering.test to 
primitive_conjunct_ordering.test in targeted-perf
..


Patch Set 1:

Query is not valid because the order in the plan doesn't change from the in the 
query, so the test is not verifying much really.

-- 
To view, visit http://gerrit.cloudera.org:8080/3021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ica8db68141ef653b0b01a7cfa7773302717a35a2
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-1928 : Impala ODBC bad performance with Kerberos mechanism

2016-05-12 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-1928 : Impala ODBC bad performance with Kerberos 
mechanism
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/2769/2/be/src/rpc/thrift-client.h
File be/src/rpc/thrift-client.h:

Line 144:   transport_ = socket_;
> I think we shouldn't call WrapClientTransport if socket_ is null. Can you j
What is the expected behavior if socket_ is NULL?


-- 
To view, visit http://gerrit.cloudera.org:8080/2769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iebcf6457091aef1fc0e5bd1549b3fcbafc5560d9
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Mostafa Mokhtar 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3480: Add query options for min/max filter sizes

2016-05-11 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-3480: Add query options for min/max filter sizes
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/2966/4/be/src/runtime/runtime-filter.h
File be/src/runtime/runtime-filter.h:

Line 103:   static const int64_t MIN_BLOOM_FILTER_SIZE = 4 * 1024;   // 
4KB
If there are not stats would the filter size default to a 4KB?
Wondering if we should increase the default min size to 1MB.


-- 
To view, visit http://gerrit.cloudera.org:8080/2966
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5c13c200a0f1855f38a5da50ca34a737e741868b
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) Renamed conjunct_ordering.test to primitive_conjunct_ordering.test in targeted-perf

2016-05-10 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: Renamed conjunct_ordering.test to 
primitive_conjunct_ordering.test in targeted-perf
..


Patch Set 1:

Also why are the predicates which are part of the join not re-ordered?
Does the change only support ordering at the scan?

-- 
To view, visit http://gerrit.cloudera.org:8080/3021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ica8db68141ef653b0b01a7cfa7773302717a35a2
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) Renamed conjunct_ordering.test to primitive_conjunct_ordering.test in targeted-perf

2016-05-10 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: Renamed conjunct_ordering.test to 
primitive_conjunct_ordering.test in targeted-perf
..


Patch Set 1: -Code-Review

Just noticed that this query is not valid as the predicates are not pushed to 
the scan and are applied at the join. 

| Explain String








 |
| Estimated Per-Host Requirements: Memory=4.43GB VCores=2   








 |
|   








 |
| 06:AGGREGATE [FINALIZE]   








 |
| |  output: sum:merge(l_extendedprice * (1 - l_discount

[Impala-CR](cdh5-trunk) Renamed conjunct_ordering.test to primitive_conjunct_ordering.test in targeted-perf

2016-05-10 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: Renamed conjunct_ordering.test to 
primitive_conjunct_ordering.test in targeted-perf
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ica8db68141ef653b0b01a7cfa7773302717a35a2
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-1928 : Impala ODBC bad performance with Kerberos mechanism

2016-05-06 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-1928 : Impala ODBC bad performance with Kerberos 
mechanism
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/2769/1//COMMIT_MSG
Commit Message:

Line 9: On the client side, the wrapping is backwards:
  : SaslTransport(TBufferedTransport(socket))
> We can rephrase this so it's more clear:
Done


Line 18: 
> You can conclude by saying:
Done


http://gerrit.cloudera.org:8080/#/c/2769/1/be/src/rpc/thrift-client.h
File be/src/rpc/thrift-client.h:

Line 137: socker
> not your change, but change to "socket".
Done


Line 144:   transport_ = socket_;
:   auth_provider_->WrapClientTransport(address_.hostname, transport_, 
service_name,
:   &transport_);
> Will this work fine if socket_ is NULL?
Done


Line 144:   transport_ = socket_;
:   auth_provider_->WrapClientTransport(address_.hostname, transport_, 
service_name,
:   &transport_);
> Not sure what the expected behavior is if socket_ is NULL. 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/2769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iebcf6457091aef1fc0e5bd1549b3fcbafc5560d9
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Mostafa Mokhtar 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) Add code review feedback

2016-05-06 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has abandoned this change.

Change subject: Add code review feedback
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/2995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Iad23205f29b3ac13ecfcdbb4b77567f0dea2652b
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Mostafa Mokhtar 


[Impala-CR](cdh5-trunk) IMPALA-1928 : Impala ODBC bad performance with Kerberos mechanism

2016-05-06 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has uploaded a new patch set (#2).

Change subject: IMPALA-1928 : Impala ODBC bad performance with Kerberos 
mechanism
..

IMPALA-1928 : Impala ODBC bad performance with Kerberos mechanism

On the client side, the wrapping of transports is the opposite of
what it is on the server side:
Client side: TSaslTransport(TBufferedTransport(socket))
Server side: TBufferedTransport(TSaslTransport(socket))

When we write a structure, we end up doing lots of write calls which hit the
TSaslTransport, which does no buffering. So it ends up producing an output
that looks like [0, 0, 0, 1], , [0, 0, 0, 1], ,
etc. for each individual write call going into it.
These end up buffered so we don't get lots of tiny packets on the send side.
But on the receiver side, we are doing one recv call per Sasl frame.

This patch reorders the wrapping of transports in the thrift client,
so that it matches the order on the thrift server which improves exhange
performance making it within 10% of non-kerberos.

Change-Id: Iebcf6457091aef1fc0e5bd1549b3fcbafc5560d9
---
M be/src/rpc/thrift-client.h
1 file changed, 7 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/69/2769/2
-- 
To view, visit http://gerrit.cloudera.org:8080/2769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iebcf6457091aef1fc0e5bd1549b3fcbafc5560d9
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Mostafa Mokhtar 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-CR](cdh5-trunk) Add code review feedback

2016-05-06 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/2995

Change subject: Add code review feedback
..

Add code review feedback

Change-Id: Iad23205f29b3ac13ecfcdbb4b77567f0dea2652b
---
M be/src/rpc/thrift-client.h
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/95/2995/1
-- 
To view, visit http://gerrit.cloudera.org:8080/2995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Iad23205f29b3ac13ecfcdbb4b77567f0dea2652b
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Mostafa Mokhtar 


[Impala-CR](cdh5-trunk) IMPALA-1928 : Impala ODBC bad performance with Kerberos mechanism

2016-05-05 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-1928 : Impala ODBC bad performance with Kerberos 
mechanism
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/2769/1/be/src/rpc/thrift-client.h
File be/src/rpc/thrift-client.h:

Line 144:   transport_ = socket_;
:   auth_provider_->WrapClientTransport(address_.hostname, transport_, 
service_name,
:   &transport_);
> Will this work fine if socket_ is NULL?
Not sure what the expected behavior is if socket_ is NULL. 
It would make sense to fail the constructor and propagate and exception of some 
sort, yet I am not sure the code is meant to do that.


-- 
To view, visit http://gerrit.cloudera.org:8080/2769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iebcf6457091aef1fc0e5bd1549b3fcbafc5560d9
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Mostafa Mokhtar 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3452: S3: Disable Impala staging for INSERTs via flag for speedup

2016-05-02 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-3452: S3: Disable Impala staging for INSERTs via flag 
for speedup
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/2905/1/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 52: DEFINE_bool(s3_skip_insert_staging, false, "Enable to skip the staging 
step for INSERTs "
> We currently have local staging before we send it to S3, so we can do away 
I  recommend changing to query option.


-- 
To view, visit http://gerrit.cloudera.org:8080/2905
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iff9620d41ba0d5fb1aa0c9f4abb48866fc2b0698
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: Software prefetching for hash table build.

2016-05-01 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-3286: Software prefetching for hash table build.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 510: scoped_ptr
> Good point. Converted to using vector.
I recommend using arrays opposed to vectors since we already know the size. 
In hot loops I have seen Vectors significantly slower than arrays.


-- 
To view, visit http://gerrit.cloudera.org:8080/2896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-2736: Basic column-wise slot materialization in Parquet scanner.

2016-04-22 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-2736: Basic column-wise slot materialization in Parquet 
scanner.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/2779/5/be/src/util/rle-encoding.h
File be/src/util/rle-encoding.h:

Line 250:   if (repeat_count_ == 0) {
Compared against "repeat_count_ + literal_count_ == 0" and the current change 
in review is marginaly faster, both are still significantly better than 
 if (UNLIKELY(literal_count_ == 0 && repeat_count_ == 0)) {
 if (!NextCounts()) return false;
  }


-- 
To view, visit http://gerrit.cloudera.org:8080/2779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Skye Wanderman-Milne 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) PREVIEW: Basic column-wise slot materialization in Parquet scanner.

2016-04-15 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: PREVIEW: Basic column-wise slot materialization in Parquet 
scanner.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/2779/2/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 1728: if (EvalRuntimeFilters(output_row) && ExecNode::EvalConjuncts(
Consider creating a short circuit if there are not RuntimeFilters of Conjucts 
to evalute. 
This should speedup the scans without filters by 10%.


-- 
To view, visit http://gerrit.cloudera.org:8080/2779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-1928 : Impala ODBC bad performance with Kerberos mechanism

2016-04-12 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/2769

Change subject: IMPALA-1928 : Impala ODBC bad performance with Kerberos 
mechanism
..

IMPALA-1928 : Impala ODBC bad performance with Kerberos mechanism

On the client side, the wrapping is backwards:
SaslTransport(TBufferedTransport(socket))

When we write a structure, we end up doing lots of write calls which hit the
TSaslTransport, which does no buffering. So it ends up producing an output
that looks like [0, 0, 0, 1], , [0, 0, 0, 1], ,
etc. for each individual write call going into it.
These end up buffered so we don't get lots of tiny packets on the send side.
But on the receiver side, we are doing one recv call per Sasl frame.

Change-Id: Iebcf6457091aef1fc0e5bd1549b3fcbafc5560d9
---
M be/src/rpc/thrift-client.h
1 file changed, 6 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/69/2769/1
-- 
To view, visit http://gerrit.cloudera.org:8080/2769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Iebcf6457091aef1fc0e5bd1549b3fcbafc5560d9
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Mostafa Mokhtar 


Re: Dynamic Runtime Filter Question

2016-04-08 Thread Mostafa Mokhtar
JOIN as opposed to
>>> other kinds of joins, ON clause with several ANDed predicates, and BETWEEN
>>> in the WHERE clause.)
>>>
>>> We're planning to add more such examples in upcoming doc refreshes.
>>> Just want to make sure we cover the kinds of areas you're curious about.
>>> Once 5.7 is installed somewhere to try out, it's straightforward to verify
>>> for particular combinations of tables and queries using EXPLAIN and looking
>>> for the extra lines representing filters.
>>>
>>> John
>>>
>>>
>>> On Apr 8, 2016, at 11:39 AM, Mostafa Mokhtar 
>>> wrote:
>>>
>>> Hi Ken,
>>>
>>> Yes the query provided should work, I validated that the query below
>>> works as expected which is fairly similar to the one listed above.
>>>
>>> select
>>> count(*)
>>> from
>>> store_sales_mp,
>>> date_dim
>>> where
>>> ss_sold_date_sk = d_date_sk
>>> and ss_store_sk = d_dow
>>> and d_year = 1999
>>> and d_moy = 1
>>>
>>>
>>> store_sales_mp is partitioned on two columns :
>>>   | # col_name   | data_type
>>> | ss_sold_date_sk
>>>| bigint
>>>   | ss_store_sk  | bigint
>>>
>>>
>>>
>>> The query execution summary shows only a small number of rows from
>>> store_sales_mp qualify the scan
>>>
>>>
>>> +-++--+--+++--+---++
>>> | Operator| #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows |
>>> Peak Mem | Est. Peak Mem | Detail |
>>>
>>> +-++--+--+++--+---++
>>> | 06:AGGREGATE| 1  | 160.66ms | 160.66ms | 1  | 1  |
>>> 88.00 KB | -1 B  | FINALIZE   |
>>> | 05:EXCHANGE | 1  | 137.98us | 137.98us | 19 | 1  |
>>> 0 B  | -1 B  | UNPARTITIONED  |
>>> | 03:AGGREGATE| 19 | 151.01ms | 183.65ms | 19 | 1  |
>>> 8.04 MB  | 10.00 MB  ||
>>> | 02:HASH JOIN| 19 | 101.27ms | 185.51ms | 4.82M  | 43.20B |
>>> 2.02 MB  | 3.89 KB   | INNER JOIN, BROADCAST  |
>>> | |--04:EXCHANGE  | 19 | 14.54us  | 25.36us  | 589| 181|
>>> 0 B  | 0 B   | BROADCAST  |
>>> | |  01:SCAN HDFS | 1  | 20.39ms  | 20.39ms  | 31 | 181|
>>> 1.82 MB  | 64.00 MB  | tpcds_15000_decimal_parquet.date_dim   |
>>> | 00:SCAN HDFS| 19 | 37.54ms  | 81.18ms  | 45.00M | 43.20B |
>>> 9.84 MB  | 0 B   | tpcds_15000_decimal_parquet.store_sales_mp |
>>>
>>> +-++--+--+++--+---++
>>>
>>>
>>>
>>> Finally from the query profile
>>>
>>>   HDFS_SCAN_NODE (id=0):(Total: 37.539ms, non-child: 37.539ms, % 
>>> non-child: 100.00%)
>>>  - AverageHdfsReadThreadConcurrency: 0.00
>>>  - AverageScannerThreadConcurrency: 0.93
>>>  - BytesRead: 489.47 KB (501221)
>>>  - BytesReadDataNodeCache: 0
>>>  - BytesReadLocal: 468.42 KB (479663)
>>>  - BytesReadRemoteUnexpected: 21.05 KB (21557)
>>>  - BytesReadShortCircuit: 468.42 KB (479663)
>>>  - DecompressionTime: 0.000ns
>>>  - MaxCompressedTextFileLength: 0
>>>  - NumColumns: 0 (0)
>>>  - NumDisksAccessed: 4 (4)
>>>  - NumRowGroups: 0 (0)
>>>  - NumScannerThreadsStarted: 5 (5)
>>>  - PeakMemoryUsage: 9.35 MB (9799572)
>>>  - PerReadThreadRawHdfsThroughput: 901.98 MB/sec
>>>  - RemoteScanRanges: 0 (0)
>>>  - RowsRead: 2.37M (2368521)
>>>  - RowsReturned: 2.37M (2368521)
>>>  - RowsReturnedRate: 62.23 M/sec
>>>  - ScanRangesComplete: 3.21K (3207)
>>>  - ScannerThreadsIn

Re: Dynamic Runtime Filter Question

2016-04-08 Thread Mostafa Mokhtar
Hi Ken,

Yes the query provided should work, I validated that the query below works
as expected which is fairly similar to the one listed above.

select
count(*)
from
store_sales_mp,
date_dim
where
ss_sold_date_sk = d_date_sk
and ss_store_sk = d_dow
and d_year = 1999
and d_moy = 1


store_sales_mp is partitioned on two columns :
  | # col_name   | data_type
| ss_sold_date_sk
   | bigint
  | ss_store_sk  | bigint



The query execution summary shows only a small number of rows from
store_sales_mp qualify the scan

+-++--+--+++--+---++
| Operator| #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows |
Peak Mem | Est. Peak Mem | Detail |
+-++--+--+++--+---++
| 06:AGGREGATE| 1  | 160.66ms | 160.66ms | 1  | 1  |
88.00 KB | -1 B  | FINALIZE   |
| 05:EXCHANGE | 1  | 137.98us | 137.98us | 19 | 1  | 0
B  | -1 B  | UNPARTITIONED  |
| 03:AGGREGATE| 19 | 151.01ms | 183.65ms | 19 | 1  |
8.04 MB  | 10.00 MB  ||
| 02:HASH JOIN| 19 | 101.27ms | 185.51ms | 4.82M  | 43.20B |
2.02 MB  | 3.89 KB   | INNER JOIN, BROADCAST  |
| |--04:EXCHANGE  | 19 | 14.54us  | 25.36us  | 589| 181| 0
B  | 0 B   | BROADCAST  |
| |  01:SCAN HDFS | 1  | 20.39ms  | 20.39ms  | 31 | 181|
1.82 MB  | 64.00 MB  | tpcds_15000_decimal_parquet.date_dim   |
| 00:SCAN HDFS| 19 | 37.54ms  | 81.18ms  | 45.00M | 43.20B |
9.84 MB  | 0 B   | tpcds_15000_decimal_parquet.store_sales_mp |
+-++--+--+++--+---++



Finally from the query profile

  HDFS_SCAN_NODE (id=0):(Total: 37.539ms, non-child: 37.539ms, %
non-child: 100.00%)
 - AverageHdfsReadThreadConcurrency: 0.00
 - AverageScannerThreadConcurrency: 0.93
 - BytesRead: 489.47 KB (501221)
 - BytesReadDataNodeCache: 0
 - BytesReadLocal: 468.42 KB (479663)
 - BytesReadRemoteUnexpected: 21.05 KB (21557)
 - BytesReadShortCircuit: 468.42 KB (479663)
 - DecompressionTime: 0.000ns
 - MaxCompressedTextFileLength: 0
 - NumColumns: 0 (0)
 - NumDisksAccessed: 4 (4)
 - NumRowGroups: 0 (0)
 - NumScannerThreadsStarted: 5 (5)
 - PeakMemoryUsage: 9.35 MB (9799572)
 - PerReadThreadRawHdfsThroughput: 901.98 MB/sec
 - RemoteScanRanges: 0 (0)
 - RowsRead: 2.37M (2368521)
 - RowsReturned: 2.37M (2368521)
 - RowsReturnedRate: 62.23 M/sec
 - ScanRangesComplete: 3.21K (3207)
 - ScannerThreadsInvoluntaryContextSwitches: 26 (26)
 - ScannerThreadsTotalWallClockTime: 420.999ms
   - MaterializeTupleTime(*): 471.000ns
   - ScannerThreadsSysTime: 82.142ms
   - ScannerThreadsUserTime: 65.777ms
 - ScannerThreadsVoluntaryContextSwitches: 5.41K (5413)
 - TotalRawHdfsReadTime(*): 1.940ms
 - TotalReadThroughput: 136.84 KB/sec
Filter 0:
   - Files processed: 3.18K (3184)
   - Files rejected: 3.13K (3130)
   - Files total: 3.18K (3184)
   - RowGroups processed: 0 (0)
   - RowGroups rejected: 0 (0)
   - RowGroups total: 0 (0)
   - Rows processed: 0 (0)
   - Rows rejected: 0 (0)
   - Rows total: 0 (0)
   - Splits processed: 4 (4)
   - Splits rejected: 0 (0)
   - Splits total: 4 (4)
Filter 1:
   - Files processed: 54 (54)
   - Files rejected: 49 (49)
   - Files total: 54 (54)
   - RowGroups processed: 0 (0)
   - RowGroups rejected: 0 (0)
   - RowGroups total: 0 (0)
   - Rows processed: 0 (0)
   - Rows rejected: 0 (0)
   - Rows total: 0 (0)
   - Splits processed: 4 (4)
   - Splits rejected: 0 (0)
   - Splits total: 4 (4)



Thanks
Mostafa


On Fri, Apr 8, 2016 at 10:04 AM, Ken Farmer  wrote:

> First off - congrats on this new release.  It looks like a lot of hard
> work went into it.
>
> I'm really excited about dynamic runtime filters, but have a question
> about whether it would apply to the following scenario, which seems oddly
> missing from the documentation.
>
> Assume 2 tables:
> 1.  fact_ids table with 50 billion rows partitioned by year, month, day,
> sensor

[Impala-CR](cdh5-trunk) IMPALA-2680: faster memory copy

2016-04-08 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-2680: faster memory copy
..


Patch Set 10:

@Tim,

Can you try using this query 
https://github.com/cloudera/Impala/blob/cdh5-trunk/testdata/workloads/targeted-perf/queries/primitive_shuffle_join_union_all_with_groupby.test

-- 
To view, visit http://gerrit.cloudera.org:8080/1686
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f6c046d966883aa66f26d58bee92c427f973e67
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Zuo Wang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zuo Wang 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-2805: Order filters based on selectivity and cost

2016-03-22 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-2805: Order filters based on selectivity and cost
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/planner/PlanNode.java
File fe/src/main/java/com/cloudera/impala/planner/PlanNode.java:

Line 667: double cost = e.getCost() + (totalCost - e.getCost()) * 
e.getSelectivity();
> Data type is taken into account when the costs are computed, since these co
Where is the data type used in cost formula? sorry I couldn't find it. 

For instance "age in (10,12,13)" should be cheaper than "city in ('New York', 
'San Francisco'), the model should reflect that.


-- 
To view, visit http://gerrit.cloudera.org:8080/2598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-2805: Order filters based on selectivity and cost

2016-03-22 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-2805: Order filters based on selectivity and cost
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/Expr.java
File fe/src/main/java/com/cloudera/impala/analysis/Expr.java:

Line 66:   public final static int ARITHMETIC_OP_COST = 1;
How did you come up with these constants?


http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/planner/PlanNode.java
File fe/src/main/java/com/cloudera/impala/planner/PlanNode.java:

Line 667: double cost = e.getCost() + (totalCost - e.getCost()) * 
e.getSelectivity();
Why is data type not taken into account?


-- 
To view, visit http://gerrit.cloudera.org:8080/2598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes