GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/storm/pull/1739

    STORM-1443 Support customizing parallelism in StormSQL

    * Add 'PARALLELISM' to table definition
      * default value is 1
    * Set parallelism to new stream while creating stream with scan
      * downstream operators will also have same parallelism unless 
repartitioned
      * not apply parallelism to output table since it can trigger repartition
    
    Below is the screenshot which runs SQL statement:
    
    <img width="1305" alt="storm-1443-screenshot" 
src="https://cloud.githubusercontent.com/assets/1317309/19513856/72a944c2-962c-11e6-91d0-2f6f08b7aefd.png";>
    
    ```
    CREATE EXTERNAL TABLE APACHE_LOGS (id INT PRIMARY KEY, remote_ip VARCHAR, 
request_url VARCHAR, request_method VARCHAR, status VARCHAR, 
request_header_user_agent VARCHAR, time_received_utc_isoformat VARCHAR, time_us 
DOUBLE) LOCATION 'kafka://localhost:2181/brokers?topic=apachelogs-v2' 
PARALLELISM 5
    CREATE EXTERNAL TABLE APACHE_SLOW_LOGS (dummy_id INT PRIMARY KEY, 
request_url VARCHAR, request_method VARCHAR, cnt INT, time_elapsed_ms_min INT, 
time_elapsed_ms_max INT, time_elapsed_ms_avg INT) LOCATION 
'kafka://localhost:2181/brokers?topic=apacheslowlogs-v2' TBLPROPERTIES 
'{"producer":{"bootstrap.servers":"localhost:9092","acks":"1","key.serializer":"org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.storm.kafka.ByteBufferSerializer"}}'
    INSERT INTO APACHE_SLOW_LOGS SELECT MIN(ID), REQUEST_URL, REQUEST_METHOD, 
COUNT(*) AS CNT, MIN(TIME_US) / 1000 AS TIME_ELAPSED_MS_MIN, MAX(TIME_US) / 
1000 AS TIME_ELAPSED_MS_MAX, AVG(TIME_US) / 1000 AS TIME_ELAPSED_MS_AVG FROM 
APACHE_LOGS GROUP BY REQUEST_URL, REQUEST_METHOD HAVING AVG(TIME_US) / 1000 >= 
300
    ```
    
    Please refer task count of each component. Task count of each component is 
5 unless it's repartitioned due to aggregation.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/storm 
STORM-1443-on-top-of-STORM-1446

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1739.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1739
    
----
commit a6fdf67547a4bf45b7892256e3ca8eb272dcd29c
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2016-10-13T10:00:10Z

    STORM-1446 Compile the Calcite logical plan to Storm Trident logical plan
    
    * Port SamzaSQL implementation to Storm
      * https://github.com/milinda/samza-sql
    * Apply some rules to optimize
    * optimize Calc
      * merge filter and projection scripts into one
      * also applying short circuit
    * Modify Trident unit tests to use new query planner
    * arrange some files
      * Move some files which are only used from standalone
      * Remove some files which are no longer used
    * guard the possibility of stack overflow error on explaining
      * just leave error logs, and print out empty plan and continue
      * reported this behavior to Calcite community
    * leave some comments to clarify what it means

commit 319479bc7d8add43ffea0370d1762c19b705c72b
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2016-10-19T09:25:53Z

    STORM-1443 Support customizing parallelism in StormSQL
    
    * Add 'PARALLELISM' to table definition
      * default value is 1
    * Set parallelism to new stream while creating stream with scan
      * downstream operators will also have same parallelism unless 
repartitioned
      * not apply parallelism to output table since it can trigger repartition

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to