Paul Rogers created DRILL-5152:
----------------------------------

             Summary: Enhance the mock data source: better data, SQL access
                 Key: DRILL-5152
                 URL: https://issues.apache.org/jira/browse/DRILL-5152
             Project: Apache Drill
          Issue Type: Improvement
          Components: Tools, Build & Test
    Affects Versions: 1.9.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
            Priority: Minor


Drill provides a mock data storage engine that generates random data. The mock 
engine is used in some older unit tests that need a volume of data, but that 
are not too particular about the details of the data.

The mock data source continues to have use even for modern tests. For example, 
the work in the external storage batch requires tests with varying amounts of 
data, but the exact form of the data is not important, just the quantity. For 
example, if we want to ensure that spilling happens at various trigger points, 
we need to read the right amount of data for that trigger.

The existing mock data source has two limitations:

1. It generates only "black/white" (alternating) values, which is awkward for 
use in sorting.
2. The mock generator is accessible only from a physical plan, but not from SQL 
queries.

This enhancement proposes to fix both limitations:

1. Generate a uniform, randomly distributed set of values.
2. Provide an encoding that lets a SQL query specify the data to be generated.

Example SQL query:
{code}
SELECT id_i, name_s50 FROM `mock`.employee_10K;
{code}

The above says to generate two fields: INTEGER (the "_i" suffix) and 
VARCHAR(50) (the "_s50") suffix; and to generate 10,000 rows (the "_10K" suffix 
on the table.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to