Re: Running cartesian joins on Drill

2017-05-08 Thread Muhammad Gelbana
​I believe ​clhubert is referring to this discussion

.

So why Drill doesn't transform this query into a nested join query ? Simply
because there is no Calcite rule to transform it into a nested loop join ?
Is it not technically possible to write such Rule or is it feasible so I
may take on this challenge ?

Also pardon me for repeating my question but I fail to find an answer in
your replies, why doesn't Drill just run a cartesian join ? Because it's
expensive regarding resources (i.e. CPU\Network\RAM) ?

Thanks a lot Shadi for the query, it works for me.

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Mon, May 8, 2017 at 6:10 AM, Shadi Khalifa  wrote:

> Hi Muhammad,
>
> I did the following as a workaround to have Cartesian product. The basic
> idea is to create a dummy column on the fly that has the value 1 in both
> tables and then join on that column leading to having a match of every row
> of the first table with every row of the second table, hence do a Cartesian
> product. This might not be the most efficient way but it will do the job.
>
> *Original Query:*
> SELECT * FROM
> ( SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc LIMIT
> 2147483647) `t0`
> INNER JOIN
> ( SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc LIMIT
> 2147483647) `t1`
> ON (`t0`.`UserID` IS NOT DISTINCT FROM `t1`.`UserID`)
> LIMIT 2147483647
>
> *Workaround (add columns **d1a381f3g73 and **d1a381f3g74 to tables one
> and two, respectively. Names don't really matter, just need to be unique):*
> SELECT * FROM
> ( SELECT *1 as d1a381f3g73*, 'ABC' `UserID` FROM
> `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t0`
> INNER JOIN
> ( SELECT *1 as d1a381f3g74*, 'ABC' `UserID` FROM
> `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1`
> ON (`t0`.*d1a381f3g73 = *`t1`.*d1a381f3g74*)
> WHERE `t0`.`UserID` IS NOT DISTINCT FROM `t1`.`UserID`
> LIMIT 2147483647
>
> Regards
>
>
> *Shadi Khalifa, PhD*
> Postdoctoral Fellow
> Cognitive Analytics Development Hub
> Centre for Advanced Computing
> Queen’s University
> (613) 533-6000 x78347
> http://cac.queensu.ca
>
> I'm just a neuron in the society collective brain
>
> *Join us for HPCS in June 2017! Register at:*  *http://2017.hpcs.ca/
> *
>
> P Please consider your environmental responsibility before printing this
> e-mail
>
> *01001001 0010 01101100 0110 01110110 01100101 0010 01000101
> 01100111 0001 0111 01110100 *
>
> *The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential material. Any review or
> dissemination of this information by persons other than the intended
> recipient is prohibited. If you received this in error, please contact the
> sender and delete the material from any computer. Thank you.*
>
>
>
> On Saturday, May 6, 2017 6:05 PM, Muhammad Gelbana 
> wrote:
>
>
> ​​
> Here it is:
>
> SELECT * FROM (SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc
> LIMIT 2147483647) `t0` INNER JOIN (SELECT 'ABC' `UserID` FROM
> `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON (
> ​​
> `t0`.`UserID` IS NOT DISTINCT FROM
> ​​
> `t1`.`UserID`) LIMIT 2147483647
>
> I debugged Drill code and found it decomposes *IS NOT DISTINCT FROM* into
> ​
> *`t0`.`UserID` = ​`t1`.`UserID` OR (`t0`.`UserID` IS NULL && `t1`.`UserID`
> IS NULL**)* while checking if the query is a cartesian join, and when the
> check returns true, it throws an excetion saying: *This query cannot be
> planned possibly due to either a cartesian join or an inequality join*
>
>
> *-*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Sat, May 6, 2017 at 6:53 PM, Gautam Parai  wrote:
>
> > Can you please specify the query you are trying to execute?
> >
> >
> > Gautam
> >
> > 
> > From: Muhammad Gelbana 
> > Sent: Saturday, May 6, 2017 7:34:53 AM
> > To: u...@drill.apache.org; dev@drill.apache.org
> > Subject: Running cartesian joins on Drill
> >
> > Is there a reason why Drill would intentionally reject cartesian join
> > queries even if *planner.enable_nljoin_for_scalar_only* is disabled ?
> >
> > Any ideas how could a query be rewritten to overcome this restriction ?
> >
> > *-*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
>
>
>


Re: Running cartesian joins on Drill

2017-05-08 Thread Zelaine Fong
Cartesian joins in Drill are implemented as nested loop joins, and I think you 
should see that reflected in the resultant query plan when you run explain plan 
on the query.

Yes, Cartesian joins/nested loop joins are expensive because you’re effectively 
doing an MxN read of your tables.  There are more efficient ways of processing 
a nested loop join, e.g., by creating an index on the larger table in the join 
and then using that index to do lookups into that table.  That way, the nested 
loop join cost is the cost of creating the index + M, where M is the number of 
rows in the smaller table and assuming the lookup cost into the index does 
minimize the amount of data read of the second table.  Drill currently doesn’t 
do this.

-- Zelaine

On 5/8/17, 9:09 AM, "Muhammad Gelbana"  wrote:

​I believe ​clhubert is referring to this discussion


.

So why Drill doesn't transform this query into a nested join query ? Simply
because there is no Calcite rule to transform it into a nested loop join ?
Is it not technically possible to write such Rule or is it feasible so I
may take on this challenge ?

Also pardon me for repeating my question but I fail to find an answer in
your replies, why doesn't Drill just run a cartesian join ? Because it's
expensive regarding resources (i.e. CPU\Network\RAM) ?

Thanks a lot Shadi for the query, it works for me.

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Mon, May 8, 2017 at 6:10 AM, Shadi Khalifa  wrote:

> Hi Muhammad,
>
> I did the following as a workaround to have Cartesian product. The basic
> idea is to create a dummy column on the fly that has the value 1 in both
> tables and then join on that column leading to having a match of every row
> of the first table with every row of the second table, hence do a 
Cartesian
> product. This might not be the most efficient way but it will do the job.
>
> *Original Query:*
> SELECT * FROM
> ( SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc LIMIT
> 2147483647) `t0`
> INNER JOIN
> ( SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc LIMIT
> 2147483647) `t1`
> ON (`t0`.`UserID` IS NOT DISTINCT FROM `t1`.`UserID`)
> LIMIT 2147483647
>
> *Workaround (add columns **d1a381f3g73 and **d1a381f3g74 to tables one
> and two, respectively. Names don't really matter, just need to be 
unique):*
> SELECT * FROM
> ( SELECT *1 as d1a381f3g73*, 'ABC' `UserID` FROM
> `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t0`
> INNER JOIN
> ( SELECT *1 as d1a381f3g74*, 'ABC' `UserID` FROM
> `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1`
> ON (`t0`.*d1a381f3g73 = *`t1`.*d1a381f3g74*)
> WHERE `t0`.`UserID` IS NOT DISTINCT FROM `t1`.`UserID`
> LIMIT 2147483647
>
> Regards
>
>
> *Shadi Khalifa, PhD*
> Postdoctoral Fellow
> Cognitive Analytics Development Hub
> Centre for Advanced Computing
> Queen’s University
> (613) 533-6000 x78347
> http://cac.queensu.ca
>
> I'm just a neuron in the society collective brain
>
> *Join us for HPCS in June 2017! Register at:*  *http://2017.hpcs.ca/
> *
>
> P Please consider your environmental responsibility before printing this
> e-mail
>
> *01001001 0010 01101100 0110 01110110 01100101 0010 01000101
> 01100111 0001 0111 01110100 *
>
> *The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential material. Any review or
> dissemination of this information by persons other than the intended
> recipient is prohibited. If you received this in error, please contact the
> sender and delete the material from any computer. Thank you.*
>
>
>
> On Saturday, May 6, 2017 6:05 PM, Muhammad Gelbana 
> wrote:
>
>
> ​​
> Here it is:
>
> SELECT * FROM (SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc
> LIMIT 2147483647) `t0` INNER JOIN (SELECT 'ABC' `UserID` FROM
> `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON (
> ​​
> `t0`.`UserID` IS NOT DISTINCT FROM
> ​​
> `t1`.`UserID`) LIMIT 2147483647
>
> I debugged Drill code and found it decomposes *IS NOT DISTINCT FROM* into
> ​
> *`t0`.`UserID` = ​`t1`.`UserID` OR (`t0`.`UserID` IS NULL && `t1`.`UserID`
> IS NULL**)* while checking if the query is a cartesian join, and when the
> check returns true, it throws an excetion saying: *This query cannot be
> planned possibly due to either a cartesian join or an inequality join*
>
>
> *-*
> *Muhammad Gelbana*
> http://www.l

Re: Running cartesian joins on Drill

2017-05-08 Thread Shadi Khalifa
Hi Muhammad,
I did the following as a workaround to have Cartesian product. The basic idea 
is to create a dummy column on the fly that has the value 1 in both tables and 
then join on that column leading to having a match of every row of the first 
table with every row of the second table, hence do a Cartesian product. This 
might not be the most efficient way but it will do the job.
Original Query:SELECT * FROM ( SELECT 'ABC' `UserID` FROM 
`dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t0` INNER JOIN ( SELECT 
'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON 
(`t0`.`UserID` IS NOT DISTINCT FROM `t1`.`UserID`) LIMIT 2147483647
Workaround (add columns d1a381f3g73 and d1a381f3g74 to tables one and two, 
respectively. Names don't really matter, just need to be unique):SELECT * FROM 
( SELECT 1 as d1a381f3g73, 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc 
LIMIT 2147483647) `t0` INNER JOIN ( SELECT 1 as d1a381f3g74, 'ABC' `UserID` 
FROM `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON 
(`t0`.d1a381f3g73 = `t1`.d1a381f3g74)WHERE `t0`.`UserID` IS NOT DISTINCT FROM 
`t1`.`UserID`LIMIT 2147483647 Regards 

 Shadi Khalifa, PhD Postdoctoral Fellow Cognitive Analytics Development Hub 
Centre for Advanced Computing Queen’s University (613) 533-6000 x78347 
http://cac.queensu.ca  I'm just a neuron in thesociety collective brain 
Join us for HPCS in June 2017! Register at:  http://2017.hpcs.ca/
 P Please consider yourenvironmental responsibility before printing this e-mail 
   01001001 0010 01101100 0110 0111011001100101 0010 01000101 
01100111 0001 0111 01110100 
  The information transmitted is intended only forthe person or entity to which 
it is addressed and may contain confidential material.Any review or 
dissemination of this information by persons other than theintended recipient 
is prohibited. If you received this in error, please contactthe sender and 
delete the material from any computer. Thank you.
 

On Saturday, May 6, 2017 6:05 PM, Muhammad Gelbana  
wrote:
 

 ​​
Here it is:

SELECT * FROM (SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc
LIMIT 2147483647) `t0` INNER JOIN (SELECT 'ABC' `UserID` FROM
`dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON (
​​
`t0`.`UserID` IS NOT DISTINCT FROM
​​
`t1`.`UserID`) LIMIT 2147483647

I debugged Drill code and found it decomposes *IS NOT DISTINCT FROM* into
​
*`t0`.`UserID` = ​`t1`.`UserID` OR (`t0`.`UserID` IS NULL && `t1`.`UserID`
IS NULL**)* while checking if the query is a cartesian join, and when the
check returns true, it throws an excetion saying: *This query cannot be
planned possibly due to either a cartesian join or an inequality join*


*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Sat, May 6, 2017 at 6:53 PM, Gautam Parai  wrote:

> Can you please specify the query you are trying to execute?
>
>
> Gautam
>
> 
> From: Muhammad Gelbana 
> Sent: Saturday, May 6, 2017 7:34:53 AM
> To: u...@drill.apache.org; dev@drill.apache.org
> Subject: Running cartesian joins on Drill
>
> Is there a reason why Drill would intentionally reject cartesian join
> queries even if *planner.enable_nljoin_for_scalar_only* is disabled ?
>
> Any ideas how could a query be rewritten to overcome this restriction ?
>
> *-*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>

   

Re: Running cartesian joins on Drill

2017-05-08 Thread Gautam Parai
Can you please specify the query you are trying to execute?


Gautam


From: Muhammad Gelbana 
Sent: Saturday, May 6, 2017 7:34:53 AM
To: u...@drill.apache.org; dev@drill.apache.org
Subject: Running cartesian joins on Drill

Is there a reason why Drill would intentionally reject cartesian join
queries even if *planner.enable_nljoin_for_scalar_only* is disabled ?

Any ideas how could a query be rewritten to overcome this restriction ?

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana


[jira] [Created] (DRILL-5486) Compliant text reader tries, but fails, to ignore empty rows

2017-05-08 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5486:
--

 Summary: Compliant text reader tries, but fails, to ignore empty 
rows
 Key: DRILL-5486
 URL: https://issues.apache.org/jira/browse/DRILL-5486
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
Priority: Minor


The "compliant" text reader (for CSV files, etc.) has two modes: reading with 
headers (which creates a scalar vector per file column), or without headers 
(that creates a single {{columns}} vector with an array of all columns.

When run in array mode, the code uses a class called {{RepeatedVarCharOutput}}. 
This class attempts to ignore empty records:

{code}
  public void finishRecord() {
...
// if there were no defined fields, skip.
if (fieldIndex > -1) {
  batchIndex++;
  recordCount++;
}
...
{code}

As it turns out, this works only on the *first* row. On the first row, the 
{{fieldIndex}} has its initial value of -1. But, for subsequent records, 
{{fieldIndex}} is never reset and retains its value from the last field set.

The result is that the code skips the first empty row, but not empty rows 
elsewhere in the file.

Further, on the first row, the logic is flawed. The part shown above is only 
for the row counts. Here is more of the logic:

{code}
  @Override
  public void finishRecord() {
recordStart = characterData;
...
int newOffset = ((int) (charLengthOffset - charLengthOffsetOriginal))/4;
PlatformDependent.putInt(repeatedOffset, newOffset);
repeatedOffset += 4;

// if there were no defined fields, skip.
if (fieldIndex > -1) {
  batchIndex++;
  recordCount++;
}
  }
{code}

Note that the underlying vector *is* bumped for the record, even though the row 
count is not. Later, this will cause the container to have one less record.

Suppose we have this data:
[]
[abc, def]

The above bug leaves the data in this form:

[]

Why? We counted only one record (the second), but we created an entry for the 
first (empty) record. When we set row count, we set it to 1, which is only the 
first. The result is the loss of the second row.

Or, that would be true if the row count ({{batchIndex}}) was used for the batch 
row count. But, it is not, the next level of code keeps a separate count, so 
the "skip empty row" logic causes the counts to get out of sync between the 
{{RepeatedVarCharOutput}} and the next level of reader.

In sense, there are multiple self-cancelling bugs:

* The skip-empty-row logic is not synchronized with the upper layer.
* That bug is partly masked because the logic is not triggered except on the 
first row.
* That bug is masked because we always set the row offset, even if we ignore 
the row in the row count.
* That bug is masked because the incorrect row count is ignored when setting 
the container row count.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5487) Vector corruption in CSV with headers and truncated last row

2017-05-08 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5487:
--

 Summary: Vector corruption in CSV with headers and truncated last 
row
 Key: DRILL-5487
 URL: https://issues.apache.org/jira/browse/DRILL-5487
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers


The CSV format plugin allows two ways of reading data:

* As named columns
* As a single array, called {{columns}}, that holds all columns for a row

The named columns feature will corrupt the offset vectors if the last row of 
the file is truncated: leaves off one or more columns.

To illustrate the CSV data corruption, I created a CSV file, test4.csv, of the 
following form:

{code}
h,u
abc,def
ghi
{code}

Note that the file is truncated: the command and second field is missing on the 
last line.

Then, I created a simple test using the "cluster fixture" framework:

{code}
  @Test
  public void readerTest() throws Exception {
FixtureBuilder builder = ClusterFixture.builder()
.maxParallelization(1);

try (ClusterFixture cluster = builder.build();
 ClientFixture client = cluster.clientFixture()) {
  TextFormatConfig csvFormat = new TextFormatConfig();
  csvFormat.fieldDelimiter = ',';
  csvFormat.skipFirstLine = false;
  csvFormat.extractHeader = true;
  cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat);
  String sql = "SELECT * FROM `dfs.data`.`csv/test4.csv` LIMIT 10";
  client.queryBuilder().sql(sql).printCsv();
}
  }
{code}

The results show we've got a problem:

{code}
Exception (no rows returned): 
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
IllegalArgumentException: length: -3 (expected: >= 0)
{code}

If the last line were:

{code}
efg,
{code}

Then the offset vector should look like this:

{code}
[0, 3, 3]
{code}

Very likely we have an offset vector that looks like this instead:

{code}
[0, 3, 0]
{code}

When we compute the second column of the second row, we should compute:

{code}
length = offset[2] - offset[1] = 3 - 3 = 0
{code}

Instead we get:

{code}
length = offset[2] - offset[1] = 0 - 3 = -3
{code}

The summary is that a premature EOF appears to cause the "missing" columns to 
be skipped; they are not filled with a blank value to "bump" the offset vectors 
to fill in the last row. Instead, they are left at 0, causing havoc downstream 
in the query.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5488) Useless code in VectorTrimmer

2017-05-08 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5488:
--

 Summary: Useless code in VectorTrimmer
 Key: DRILL-5488
 URL: https://issues.apache.org/jira/browse/DRILL-5488
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
Priority: Trivial


Consider this code in a generated fixed-width vector, such as UInt4Vector:

{code}
@Override
public void setValueCount(int valueCount) {
  ...
  final int idx = (VALUE_WIDTH * valueCount);
  ...
  VectorTrimmer.trim(data, idx);
  data.writerIndex(valueCount * VALUE_WIDTH);
}
{code}

Consider the {{trim()}} method:

{code}
public class VectorTrimmer {
  ...
  public static void trim(ByteBuf data, int idx) {
data.writerIndex(idx);
if (data instanceof DrillBuf) {
  // data.capacity(idx);
  data.writerIndex(idx);
}
  }
}
{code}

This method is called {{trim}}, but it actually sets the writer index in the 
buffer (though we never use that index.) Since all buffers we use are 
{{DrillBuf}}, the if-statement is a no-op: we simply set the writer index twice.

It seems this code can simply be discarded: it is called from only two places; 
neither of which end up using the writer index.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r114778278
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -495,26 +612,45 @@ connectionStatus_t 
DrillClientImpl::handleAuthentication(const DrillUserProperti
 }
 }
 
+std::stringstream errorMsg;
--- End diff --

"logMsg" would be a better name than "errorMsg" because this is also used 
in a successful path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r114828848
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -854,75 +990,328 @@ void DrillClientImpl::waitForResults(){
 }
 }
 
-status_t DrillClientImpl::readMsg(ByteBuf_t _buf,
-AllocatedBufferPtr* allocatedBuffer,
+/*
+ *  Decode the length of the message from bufWithLen and then read entire 
message from the socket.
+ *  Parameters:
+ *  bufWithLen  - in param  - buffer containing the bytes 
which has length of the RPC message/encrypted chunk
+ *  bufferWithLenBytes  - out param - buffer pointer which points to 
memory allocated in this function and has the
+ *entire one RPC message / 
encrypted chunk along with the length of the message
+ *  lengthBytesLength   - out param - bytes of bufWithLen which 
contains the length of the entire RPC message or
--- End diff --

How about 'bufWithLenSize'?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r115372156
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -854,75 +990,328 @@ void DrillClientImpl::waitForResults(){
 }
 }
 
-status_t DrillClientImpl::readMsg(ByteBuf_t _buf,
-AllocatedBufferPtr* allocatedBuffer,
+/*
+ *  Decode the length of the message from bufWithLen and then read entire 
message from the socket.
+ *  Parameters:
+ *  bufWithLen  - in param  - buffer containing the bytes 
which has length of the RPC message/encrypted chunk
+ *  bufferWithLenBytes  - out param - buffer pointer which points to 
memory allocated in this function and has the
+ *entire one RPC message / 
encrypted chunk along with the length of the message
+ *  lengthBytesLength   - out param - bytes of bufWithLen which 
contains the length of the entire RPC message or
+ *encrypted chunk
+ *  lengthDecodeHandler - in param  - function pointer with length 
decoder to use. For encrypted chunk we use
+ *lengthDecode and for plain RPC 
message we use rpcLengthDecode.
+ *  Return:
+ *  status_t- QRY_SUCCESS- In case of success.
+ *  - 
QRY_COMM_ERROR/QRY_INTERNAL_ERROR/QRY_CLIENT_OUTOFMEM - In cases of error.
+ */
+status_t DrillClientImpl::readLenBytesFromSocket(ByteBuf_t bufWithLen, 
AllocatedBufferPtr* bufferWithLenBytes,
+   uint32_t& lengthBytesLength, lengthDecoder lengthDecodeHandler) 
{
+
+uint32_t rmsgLen = 0;
+size_t bytes_read = 0;
+size_t leftover = 0;
+boost::system::error_code error;
+*bufferWithLenBytes = NULL;
+size_t bufferWithLenBytesSize = 0;
+
+bytes_read = (this->*lengthDecodeHandler)(bufWithLen, rmsgLen);
+lengthBytesLength = bytes_read;
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Length bytes = " << bytes_read 
<< std::endl;)
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Msg Length = " << rmsgLen << 
std::endl;)
+
+if(rmsgLen>0){
+leftover = LEN_PREFIX_BUFLEN - bytes_read;
+
+// Allocate a buffer for reading all the bytes in bufWithLen and 
length number of bytes
+   bufferWithLenBytesSize = rmsgLen + bytes_read;
+*bufferWithLenBytes = new AllocatedBuffer(bufferWithLenBytesSize);
+
+if(*bufferWithLenBytes == NULL){
+return handleQryError(QRY_CLIENT_OUTOFMEM, 
getMessage(ERR_QRY_OUTOFMEM), NULL);
+}
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << 
"DrillClientImpl::readLenBytesFromSocket: Allocated and locked buffer: [ "
+  << *bufferWithLenBytes << ", 
size = " << bufferWithLenBytesSize << " ]\n";)
+
+// Copy the memory of bufWithLen into bufferWithLenBytesSize
+memcpy((*bufferWithLenBytes)->m_pBuffer, bufWithLen, 
LEN_PREFIX_BUFLEN);
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Copied bufWithLen into 
bufferWithLenBytes. "
+  << "Now reading data (rmsgLen - 
leftover) : " << (rmsgLen - leftover)
+  << std::endl;)
+
+// Read the entire data left from socket and copy to currentBuffer.
+ByteBuf_t b = (*bufferWithLenBytes)->m_pBuffer + LEN_PREFIX_BUFLEN;
+size_t bytesToRead = rmsgLen - leftover;
+
+while(1){
+bytes_read = this->m_socket.read_some(boost::asio::buffer(b, 
bytesToRead), error);
+if(error) break;
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Data Message: actual 
bytes read = " << bytes_read << std::endl;)
+if(bytes_read == bytesToRead) break;
+bytesToRead -= bytes_read;
+b += bytes_read;
+}
+} else {
+return handleQryError(QRY_INTERNAL_ERROR, 
getMessage(ERR_QRY_INVREADLEN), NULL);
+}
+
+return error ? handleQryError(QRY_COMM_ERROR, 
getMessage(ERR_QRY_COMMERR, error.message().c_str()), NULL)
+ : QRY_SUCCESS;
+}
+
+
+/*
+ *  Function to read entire RPC message from socket and decode it to 
InboundRpcMessage
+ *  Parameters:
+ *  _buf- in param  - Buffer containing the length bytes.
+ *  allocatedBuffer - out param - Buffer containing the length bytes 
and entire RPC message bytes.
+ *  msg - out param - Decoded InBoundRpcMessage from the 
bytes in allocatedBuffer
+ *  Return:
+ *  status_t- QRY_SUCCESS   - In case of success.
+ *  - 
QRY_COMM_ERROR/QRY_INTERNAL_ERROR/QRY_CLIENT_OUTOFMEM 

[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r11272
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -245,6 +264,64 @@ connectionStatus_t 
DrillClientImpl::sendSync(rpc::OutBoundRpcMessage& msg){
 }
 }
 
+/*
+ * Send handler for sending encrypted messages over wire. It encrypts the 
send buffer using wrap api provided by
+ * saslAuthenticatorImpl and then transmit the encrypted bytes over wire.
+ *
+ * Return:
+ *  connectionStatus_t  -   CONN_SUCCESS - In case of successful send
+ *  -   CONN_FAILURE - In case of failure to send
+ */
+connectionStatus_t DrillClientImpl::sendSyncEncrypted() {
+
+boost::system::error_code ec;
+
+// Split the encoded message into the rawWrapSendSize and then encrypt 
each chunk. Each encrypted chunk along with
+// its encrypted length in network order (added by Cyrus-SASL plugin) 
is sent over wire.
+const int wrapChunkSize = m_encryptionCtxt.getRawWrapSendSize();
+int numChunks = ceil((double)m_wbuf.size() / wrapChunkSize);
+int lengthToEncrypt = m_wbuf.size();
+int currentChunkLen = std::min(wrapChunkSize, lengthToEncrypt);
+uint32_t startIndex = 0, wrappedLen = 0;
+const char* wrappedChunk = NULL;
+std::stringstream errorMsg;
+
+// Encrypt and send each chunk
+while(numChunks > 0) {
+int wrapResult = m_saslAuthenticator->wrap(reinterpret_cast(m_wbuf.data() + startIndex),
+   currentChunkLen, 
&wrappedChunk, wrappedLen);
+if(SASL_OK != wrapResult) {
+errorMsg << "Sasl wrap failed while encrypting chunk of 
length: " << currentChunkLen << " , EncodeError: "
+ << wrapResult;
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << 
"DrillClientImpl::sendSyncEncrypted - " << errorMsg.str()
+  << " ,StartIndex: " << 
startIndex << ". ChunkNum: " << numChunks
+  << ", Closing connection.";)
+return handleConnError(CONN_FAILURE, 
getMessage(ERR_CONN_WFAIL, errorMsg.str().c_str()));
+}
+
+// Send the encrypted chunk.
+size_t s = m_socket.write_some(boost::asio::buffer(wrappedChunk, 
wrappedLen), ec);
+
+if(ec || s==0){
+errorMsg << "Failure while sending encrypted chunk. Error: " 
<< ec.message().c_str();
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << 
"DrillClientImpl::sendSyncEncrypted - " << errorMsg.str() << " Chunk:"
+  << numChunks << ", Original 
Length: " << currentChunkLen
+  << ", StartIndex:" << 
startIndex << ", Closing connection.";)
+return handleConnError(CONN_FAILURE, 
getMessage(ERR_CONN_WFAIL, errorMsg.str().c_str()));
--- End diff --

What happens to the memory allocated for wrappedChunk when this path is 
taken? Does handleConnError() deal with it ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r114775522
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -370,6 +453,33 @@ void DrillClientImpl::handleHShakeReadTimeout(const 
boost::system::error_code &
 return;
 }
 
+/*
+ * Check's if client has explicitly expressed interest in encrypted 
connections only. It looks for USERPROP_ENCRYPTION
+ * connection string property. If set to true then returns true else 
returns false
+ */
+bool DrillClientImpl::clientNeedsEncryption(const DrillUserProperties* 
userProperties) {
+bool needsEncryption = false;
+// check if userProperties is null
+if(!userProperties) {
+return needsEncryption;
+}
+
+// Loop through the property to find USERPROP_ENCRYPTION and it's value
+for (size_t i = 0; i < userProperties->size(); i++) {
--- End diff --

Not related to your change: I wonder why  DrillUserProperties was not 
implemented as a map?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r115369469
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -854,75 +990,328 @@ void DrillClientImpl::waitForResults(){
 }
 }
 
-status_t DrillClientImpl::readMsg(ByteBuf_t _buf,
-AllocatedBufferPtr* allocatedBuffer,
+/*
+ *  Decode the length of the message from bufWithLen and then read entire 
message from the socket.
+ *  Parameters:
+ *  bufWithLen  - in param  - buffer containing the bytes 
which has length of the RPC message/encrypted chunk
+ *  bufferWithLenBytes  - out param - buffer pointer which points to 
memory allocated in this function and has the
+ *entire one RPC message / 
encrypted chunk along with the length of the message
+ *  lengthBytesLength   - out param - bytes of bufWithLen which 
contains the length of the entire RPC message or
+ *encrypted chunk
+ *  lengthDecodeHandler - in param  - function pointer with length 
decoder to use. For encrypted chunk we use
+ *lengthDecode and for plain RPC 
message we use rpcLengthDecode.
+ *  Return:
+ *  status_t- QRY_SUCCESS- In case of success.
+ *  - 
QRY_COMM_ERROR/QRY_INTERNAL_ERROR/QRY_CLIENT_OUTOFMEM - In cases of error.
+ */
+status_t DrillClientImpl::readLenBytesFromSocket(ByteBuf_t bufWithLen, 
AllocatedBufferPtr* bufferWithLenBytes,
+   uint32_t& lengthBytesLength, lengthDecoder lengthDecodeHandler) 
{
+
+uint32_t rmsgLen = 0;
+size_t bytes_read = 0;
+size_t leftover = 0;
+boost::system::error_code error;
+*bufferWithLenBytes = NULL;
+size_t bufferWithLenBytesSize = 0;
+
+bytes_read = (this->*lengthDecodeHandler)(bufWithLen, rmsgLen);
+lengthBytesLength = bytes_read;
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Length bytes = " << bytes_read 
<< std::endl;)
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Msg Length = " << rmsgLen << 
std::endl;)
+
+if(rmsgLen>0){
+leftover = LEN_PREFIX_BUFLEN - bytes_read;
+
+// Allocate a buffer for reading all the bytes in bufWithLen and 
length number of bytes
+   bufferWithLenBytesSize = rmsgLen + bytes_read;
+*bufferWithLenBytes = new AllocatedBuffer(bufferWithLenBytesSize);
+
+if(*bufferWithLenBytes == NULL){
+return handleQryError(QRY_CLIENT_OUTOFMEM, 
getMessage(ERR_QRY_OUTOFMEM), NULL);
+}
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << 
"DrillClientImpl::readLenBytesFromSocket: Allocated and locked buffer: [ "
+  << *bufferWithLenBytes << ", 
size = " << bufferWithLenBytesSize << " ]\n";)
+
+// Copy the memory of bufWithLen into bufferWithLenBytesSize
+memcpy((*bufferWithLenBytes)->m_pBuffer, bufWithLen, 
LEN_PREFIX_BUFLEN);
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Copied bufWithLen into 
bufferWithLenBytes. "
+  << "Now reading data (rmsgLen - 
leftover) : " << (rmsgLen - leftover)
+  << std::endl;)
+
+// Read the entire data left from socket and copy to currentBuffer.
+ByteBuf_t b = (*bufferWithLenBytes)->m_pBuffer + LEN_PREFIX_BUFLEN;
+size_t bytesToRead = rmsgLen - leftover;
+
+while(1){
+bytes_read = this->m_socket.read_some(boost::asio::buffer(b, 
bytesToRead), error);
+if(error) break;
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Data Message: actual 
bytes read = " << bytes_read << std::endl;)
+if(bytes_read == bytesToRead) break;
+bytesToRead -= bytes_read;
+b += bytes_read;
+}
+} else {
+return handleQryError(QRY_INTERNAL_ERROR, 
getMessage(ERR_QRY_INVREADLEN), NULL);
+}
+
+return error ? handleQryError(QRY_COMM_ERROR, 
getMessage(ERR_QRY_COMMERR, error.message().c_str()), NULL)
--- End diff --

what happens to memory allocated for bufferWithLenBytes on error?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r114828323
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -854,75 +990,328 @@ void DrillClientImpl::waitForResults(){
 }
 }
 
-status_t DrillClientImpl::readMsg(ByteBuf_t _buf,
-AllocatedBufferPtr* allocatedBuffer,
+/*
+ *  Decode the length of the message from bufWithLen and then read entire 
message from the socket.
+ *  Parameters:
+ *  bufWithLen  - in param  - buffer containing the bytes 
which has length of the RPC message/encrypted chunk
+ *  bufferWithLenBytes  - out param - buffer pointer which points to 
memory allocated in this function and has the
--- End diff --

'bufferWithLenBytes' this name is confusing given that there is already a 
'bufWithLen'. How about something like 'bufWithRPCMsg'?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r114818140
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -854,75 +990,328 @@ void DrillClientImpl::waitForResults(){
 }
 }
 
-status_t DrillClientImpl::readMsg(ByteBuf_t _buf,
-AllocatedBufferPtr* allocatedBuffer,
+/*
+ *  Decode the length of the message from bufWithLen and then read entire 
message from the socket.
+ *  Parameters:
+ *  bufWithLen  - in param  - buffer containing the bytes 
which has length of the RPC message/encrypted chunk
+ *  bufferWithLenBytes  - out param - buffer pointer which points to 
memory allocated in this function and has the
+ *entire one RPC message / 
encrypted chunk along with the length of the message
+ *  lengthBytesLength   - out param - bytes of bufWithLen which 
contains the length of the entire RPC message or
+ *encrypted chunk
+ *  lengthDecodeHandler - in param  - function pointer with length 
decoder to use. For encrypted chunk we use
+ *lengthDecode and for plain RPC 
message we use rpcLengthDecode.
+ *  Return:
+ *  status_t- QRY_SUCCESS- In case of success.
+ *  - 
QRY_COMM_ERROR/QRY_INTERNAL_ERROR/QRY_CLIENT_OUTOFMEM - In cases of error.
+ */
+status_t DrillClientImpl::readLenBytesFromSocket(ByteBuf_t bufWithLen, 
AllocatedBufferPtr* bufferWithLenBytes,
+   uint32_t& lengthBytesLength, lengthDecoder lengthDecodeHandler) 
{
+
+uint32_t rmsgLen = 0;
+size_t bytes_read = 0;
+size_t leftover = 0;
+boost::system::error_code error;
+*bufferWithLenBytes = NULL;
+size_t bufferWithLenBytesSize = 0;
+
+bytes_read = (this->*lengthDecodeHandler)(bufWithLen, rmsgLen);
--- End diff --

What does it mean to access a local using the 'this' ptr?
Can't this be written as bytes_read = (lengthDecodeHandler)(bufWithLen, 
rmsgLen); ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r114623038
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -854,75 +990,328 @@ void DrillClientImpl::waitForResults(){
 }
 }
 
-status_t DrillClientImpl::readMsg(ByteBuf_t _buf,
-AllocatedBufferPtr* allocatedBuffer,
+/*
+ *  Decode the length of the message from bufWithLen and then read entire 
message from the socket.
+ *  Parameters:
+ *  bufWithLen  - in param  - buffer containing the bytes 
which has length of the RPC message/encrypted chunk
+ *  bufferWithLenBytes  - out param - buffer pointer which points to 
memory allocated in this function and has the
+ *entire one RPC message / 
encrypted chunk along with the length of the message
+ *  lengthBytesLength   - out param - bytes of bufWithLen which 
contains the length of the entire RPC message or
+ *encrypted chunk
+ *  lengthDecodeHandler - in param  - function pointer with length 
decoder to use. For encrypted chunk we use
+ *lengthDecode and for plain RPC 
message we use rpcLengthDecode.
+ *  Return:
+ *  status_t- QRY_SUCCESS- In case of success.
+ *  - 
QRY_COMM_ERROR/QRY_INTERNAL_ERROR/QRY_CLIENT_OUTOFMEM - In cases of error.
+ */
+status_t DrillClientImpl::readLenBytesFromSocket(ByteBuf_t bufWithLen, 
AllocatedBufferPtr* bufferWithLenBytes,
+   uint32_t& lengthBytesLength, lengthDecoder lengthDecodeHandler) 
{
+
+uint32_t rmsgLen = 0;
+size_t bytes_read = 0;
+size_t leftover = 0;
+boost::system::error_code error;
+*bufferWithLenBytes = NULL;
+size_t bufferWithLenBytesSize = 0;
+
+bytes_read = (this->*lengthDecodeHandler)(bufWithLen, rmsgLen);
+lengthBytesLength = bytes_read;
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Length bytes = " << bytes_read 
<< std::endl;)
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Msg Length = " << rmsgLen << 
std::endl;)
+
+if(rmsgLen>0){
+leftover = LEN_PREFIX_BUFLEN - bytes_read;
+
+// Allocate a buffer for reading all the bytes in bufWithLen and 
length number of bytes
+   bufferWithLenBytesSize = rmsgLen + bytes_read;
+*bufferWithLenBytes = new AllocatedBuffer(bufferWithLenBytesSize);
+
+if(*bufferWithLenBytes == NULL){
+return handleQryError(QRY_CLIENT_OUTOFMEM, 
getMessage(ERR_QRY_OUTOFMEM), NULL);
+}
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << 
"DrillClientImpl::readLenBytesFromSocket: Allocated and locked buffer: [ "
+  << *bufferWithLenBytes << ", 
size = " << bufferWithLenBytesSize << " ]\n";)
+
+// Copy the memory of bufWithLen into bufferWithLenBytesSize
+memcpy((*bufferWithLenBytes)->m_pBuffer, bufWithLen, 
LEN_PREFIX_BUFLEN);
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Copied bufWithLen into 
bufferWithLenBytes. "
+  << "Now reading data (rmsgLen - 
leftover) : " << (rmsgLen - leftover)
+  << std::endl;)
+
+// Read the entire data left from socket and copy to currentBuffer.
+ByteBuf_t b = (*bufferWithLenBytes)->m_pBuffer + LEN_PREFIX_BUFLEN;
+size_t bytesToRead = rmsgLen - leftover;
+
+while(1){
+bytes_read = this->m_socket.read_some(boost::asio::buffer(b, 
bytesToRead), error);
+if(error) break;
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Data Message: actual 
bytes read = " << bytes_read << std::endl;)
+if(bytes_read == bytesToRead) break;
+bytesToRead -= bytes_read;
+b += bytes_read;
+}
+} else {
+return handleQryError(QRY_INTERNAL_ERROR, 
getMessage(ERR_QRY_INVREADLEN), NULL);
+}
+
+return error ? handleQryError(QRY_COMM_ERROR, 
getMessage(ERR_QRY_COMMERR, error.message().c_str()), NULL)
+ : QRY_SUCCESS;
+}
+
+
+/*
+ *  Function to read entire RPC message from socket and decode it to 
InboundRpcMessage
+ *  Parameters:
+ *  _buf- in param  - Buffer containing the length bytes.
+ *  allocatedBuffer - out param - Buffer containing the length bytes 
and entire RPC message bytes.
+ *  msg - out param - Decoded InBoundRpcMessage from the 
bytes in allocatedBuffer
+ *  Return:
+ *  status_t- QRY_SUCCESS   - In case of success.
+ *  - 
QRY_COMM_ERROR/QRY_INTERNAL_ERROR/QRY_CLIENT_OUTOFMEM 

[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r114629588
  
--- Diff: contrib/native/client/src/clientlib/utils.cpp ---
@@ -111,4 +111,52 @@ AllocatedBuffer::~AllocatedBuffer(){
 m_bufSize = 0;
 }
 
+EncryptionContext::EncryptionContext(const bool& encryptionReqd, const 
int& wrapChunkSize, const int& rawSendSize) {
+this->m_bEncryptionReqd = encryptionReqd;
+this->m_maxWrapChunkSize = wrapChunkSize;
+this->m_rawWrapSendSize = rawSendSize;
+}
+
+EncryptionContext::EncryptionContext() {
+this->m_bEncryptionReqd = false;
+// SASL Framework only allows 3 octet length field during negotiation 
so maximum wrap message
+// length can be 16MB i.e. 0XFF
--- End diff --

// so maximum wrap message length can be 16MB i.e. 0XFF
Max maxWrapChunkSize has to be strictly less than 16MB. 0XFF = 16MB - 1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r115371638
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -854,75 +990,328 @@ void DrillClientImpl::waitForResults(){
 }
 }
 
-status_t DrillClientImpl::readMsg(ByteBuf_t _buf,
-AllocatedBufferPtr* allocatedBuffer,
+/*
+ *  Decode the length of the message from bufWithLen and then read entire 
message from the socket.
+ *  Parameters:
+ *  bufWithLen  - in param  - buffer containing the bytes 
which has length of the RPC message/encrypted chunk
+ *  bufferWithLenBytes  - out param - buffer pointer which points to 
memory allocated in this function and has the
+ *entire one RPC message / 
encrypted chunk along with the length of the message
+ *  lengthBytesLength   - out param - bytes of bufWithLen which 
contains the length of the entire RPC message or
+ *encrypted chunk
+ *  lengthDecodeHandler - in param  - function pointer with length 
decoder to use. For encrypted chunk we use
+ *lengthDecode and for plain RPC 
message we use rpcLengthDecode.
+ *  Return:
+ *  status_t- QRY_SUCCESS- In case of success.
+ *  - 
QRY_COMM_ERROR/QRY_INTERNAL_ERROR/QRY_CLIENT_OUTOFMEM - In cases of error.
+ */
+status_t DrillClientImpl::readLenBytesFromSocket(ByteBuf_t bufWithLen, 
AllocatedBufferPtr* bufferWithLenBytes,
+   uint32_t& lengthBytesLength, lengthDecoder lengthDecodeHandler) 
{
+
+uint32_t rmsgLen = 0;
+size_t bytes_read = 0;
+size_t leftover = 0;
+boost::system::error_code error;
+*bufferWithLenBytes = NULL;
+size_t bufferWithLenBytesSize = 0;
+
+bytes_read = (this->*lengthDecodeHandler)(bufWithLen, rmsgLen);
+lengthBytesLength = bytes_read;
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Length bytes = " << bytes_read 
<< std::endl;)
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Msg Length = " << rmsgLen << 
std::endl;)
+
+if(rmsgLen>0){
+leftover = LEN_PREFIX_BUFLEN - bytes_read;
+
+// Allocate a buffer for reading all the bytes in bufWithLen and 
length number of bytes
+   bufferWithLenBytesSize = rmsgLen + bytes_read;
+*bufferWithLenBytes = new AllocatedBuffer(bufferWithLenBytesSize);
+
+if(*bufferWithLenBytes == NULL){
+return handleQryError(QRY_CLIENT_OUTOFMEM, 
getMessage(ERR_QRY_OUTOFMEM), NULL);
+}
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << 
"DrillClientImpl::readLenBytesFromSocket: Allocated and locked buffer: [ "
+  << *bufferWithLenBytes << ", 
size = " << bufferWithLenBytesSize << " ]\n";)
+
+// Copy the memory of bufWithLen into bufferWithLenBytesSize
+memcpy((*bufferWithLenBytes)->m_pBuffer, bufWithLen, 
LEN_PREFIX_BUFLEN);
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Copied bufWithLen into 
bufferWithLenBytes. "
+  << "Now reading data (rmsgLen - 
leftover) : " << (rmsgLen - leftover)
+  << std::endl;)
+
+// Read the entire data left from socket and copy to currentBuffer.
+ByteBuf_t b = (*bufferWithLenBytes)->m_pBuffer + LEN_PREFIX_BUFLEN;
+size_t bytesToRead = rmsgLen - leftover;
+
+while(1){
+bytes_read = this->m_socket.read_some(boost::asio::buffer(b, 
bytesToRead), error);
+if(error) break;
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Data Message: actual 
bytes read = " << bytes_read << std::endl;)
+if(bytes_read == bytesToRead) break;
+bytesToRead -= bytes_read;
+b += bytes_read;
+}
+} else {
+return handleQryError(QRY_INTERNAL_ERROR, 
getMessage(ERR_QRY_INVREADLEN), NULL);
+}
+
+return error ? handleQryError(QRY_COMM_ERROR, 
getMessage(ERR_QRY_COMMERR, error.message().c_str()), NULL)
+ : QRY_SUCCESS;
+}
+
+
+/*
+ *  Function to read entire RPC message from socket and decode it to 
InboundRpcMessage
+ *  Parameters:
+ *  _buf- in param  - Buffer containing the length bytes.
+ *  allocatedBuffer - out param - Buffer containing the length bytes 
and entire RPC message bytes.
+ *  msg - out param - Decoded InBoundRpcMessage from the 
bytes in allocatedBuffer
+ *  Return:
+ *  status_t- QRY_SUCCESS   - In case of success.
+ *  - 
QRY_COMM_ERROR/QRY_INTERNAL_ERROR/QRY_CLIENT_OUTOFMEM 

[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r114442242
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -245,6 +264,64 @@ connectionStatus_t 
DrillClientImpl::sendSync(rpc::OutBoundRpcMessage& msg){
 }
 }
 
+/*
+ * Send handler for sending encrypted messages over wire. It encrypts the 
send buffer using wrap api provided by
+ * saslAuthenticatorImpl and then transmit the encrypted bytes over wire.
+ *
+ * Return:
+ *  connectionStatus_t  -   CONN_SUCCESS - In case of successful send
+ *  -   CONN_FAILURE - In case of failure to send
+ */
+connectionStatus_t DrillClientImpl::sendSyncEncrypted() {
+
+boost::system::error_code ec;
+
+// Split the encoded message into the rawWrapSendSize and then encrypt 
each chunk. Each encrypted chunk along with
+// its encrypted length in network order (added by Cyrus-SASL plugin) 
is sent over wire.
+const int wrapChunkSize = m_encryptionCtxt.getRawWrapSendSize();
+int numChunks = ceil((double)m_wbuf.size() / wrapChunkSize);
--- End diff --

'chunksRemaining' would be better a better name than 'numChunks' because 
the value is decremented per chunk


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r114442866
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -245,6 +264,64 @@ connectionStatus_t 
DrillClientImpl::sendSync(rpc::OutBoundRpcMessage& msg){
 }
 }
 
+/*
+ * Send handler for sending encrypted messages over wire. It encrypts the 
send buffer using wrap api provided by
+ * saslAuthenticatorImpl and then transmit the encrypted bytes over wire.
+ *
+ * Return:
+ *  connectionStatus_t  -   CONN_SUCCESS - In case of successful send
+ *  -   CONN_FAILURE - In case of failure to send
+ */
+connectionStatus_t DrillClientImpl::sendSyncEncrypted() {
+
+boost::system::error_code ec;
+
+// Split the encoded message into the rawWrapSendSize and then encrypt 
each chunk. Each encrypted chunk along with
+// its encrypted length in network order (added by Cyrus-SASL plugin) 
is sent over wire.
+const int wrapChunkSize = m_encryptionCtxt.getRawWrapSendSize();
+int numChunks = ceil((double)m_wbuf.size() / wrapChunkSize);
+int lengthToEncrypt = m_wbuf.size();
+int currentChunkLen = std::min(wrapChunkSize, lengthToEncrypt);
+uint32_t startIndex = 0, wrappedLen = 0;
+const char* wrappedChunk = NULL;
+std::stringstream errorMsg;
+
+// Encrypt and send each chunk
+while(numChunks > 0) {
+int wrapResult = m_saslAuthenticator->wrap(reinterpret_cast(m_wbuf.data() + startIndex),
+   currentChunkLen, 
&wrappedChunk, wrappedLen);
+if(SASL_OK != wrapResult) {
+errorMsg << "Sasl wrap failed while encrypting chunk of 
length: " << currentChunkLen << " , EncodeError: "
+ << wrapResult;
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << 
"DrillClientImpl::sendSyncEncrypted - " << errorMsg.str()
+  << " ,StartIndex: " << 
startIndex << ". ChunkNum: " << numChunks
+  << ", Closing connection.";)
+return handleConnError(CONN_FAILURE, 
getMessage(ERR_CONN_WFAIL, errorMsg.str().c_str()));
+}
+
+// Send the encrypted chunk.
+size_t s = m_socket.write_some(boost::asio::buffer(wrappedChunk, 
wrappedLen), ec);
+
+if(ec || s==0){
+errorMsg << "Failure while sending encrypted chunk. Error: " 
<< ec.message().c_str();
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << 
"DrillClientImpl::sendSyncEncrypted - " << errorMsg.str() << " Chunk:"
--- End diff --

Did you want to print the chunk number or chunks remaining? ChunkNum should 
be Total chunks - Chunks Remaining.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r114442367
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -245,6 +264,64 @@ connectionStatus_t 
DrillClientImpl::sendSync(rpc::OutBoundRpcMessage& msg){
 }
 }
 
+/*
+ * Send handler for sending encrypted messages over wire. It encrypts the 
send buffer using wrap api provided by
+ * saslAuthenticatorImpl and then transmit the encrypted bytes over wire.
+ *
+ * Return:
+ *  connectionStatus_t  -   CONN_SUCCESS - In case of successful send
+ *  -   CONN_FAILURE - In case of failure to send
+ */
+connectionStatus_t DrillClientImpl::sendSyncEncrypted() {
+
+boost::system::error_code ec;
+
+// Split the encoded message into the rawWrapSendSize and then encrypt 
each chunk. Each encrypted chunk along with
+// its encrypted length in network order (added by Cyrus-SASL plugin) 
is sent over wire.
+const int wrapChunkSize = m_encryptionCtxt.getRawWrapSendSize();
+int numChunks = ceil((double)m_wbuf.size() / wrapChunkSize);
+int lengthToEncrypt = m_wbuf.size();
+int currentChunkLen = std::min(wrapChunkSize, lengthToEncrypt);
+uint32_t startIndex = 0, wrappedLen = 0;
--- End diff --

"currentChunkOffset" would be a better name than 'startIndex'. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r115370795
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -854,75 +990,328 @@ void DrillClientImpl::waitForResults(){
 }
 }
 
-status_t DrillClientImpl::readMsg(ByteBuf_t _buf,
-AllocatedBufferPtr* allocatedBuffer,
+/*
+ *  Decode the length of the message from bufWithLen and then read entire 
message from the socket.
+ *  Parameters:
+ *  bufWithLen  - in param  - buffer containing the bytes 
which has length of the RPC message/encrypted chunk
+ *  bufferWithLenBytes  - out param - buffer pointer which points to 
memory allocated in this function and has the
+ *entire one RPC message / 
encrypted chunk along with the length of the message
+ *  lengthBytesLength   - out param - bytes of bufWithLen which 
contains the length of the entire RPC message or
+ *encrypted chunk
+ *  lengthDecodeHandler - in param  - function pointer with length 
decoder to use. For encrypted chunk we use
+ *lengthDecode and for plain RPC 
message we use rpcLengthDecode.
+ *  Return:
+ *  status_t- QRY_SUCCESS- In case of success.
+ *  - 
QRY_COMM_ERROR/QRY_INTERNAL_ERROR/QRY_CLIENT_OUTOFMEM - In cases of error.
+ */
+status_t DrillClientImpl::readLenBytesFromSocket(ByteBuf_t bufWithLen, 
AllocatedBufferPtr* bufferWithLenBytes,
+   uint32_t& lengthBytesLength, lengthDecoder lengthDecodeHandler) 
{
+
+uint32_t rmsgLen = 0;
+size_t bytes_read = 0;
+size_t leftover = 0;
+boost::system::error_code error;
+*bufferWithLenBytes = NULL;
+size_t bufferWithLenBytesSize = 0;
+
+bytes_read = (this->*lengthDecodeHandler)(bufWithLen, rmsgLen);
+lengthBytesLength = bytes_read;
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Length bytes = " << bytes_read 
<< std::endl;)
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Msg Length = " << rmsgLen << 
std::endl;)
+
+if(rmsgLen>0){
+leftover = LEN_PREFIX_BUFLEN - bytes_read;
+
+// Allocate a buffer for reading all the bytes in bufWithLen and 
length number of bytes
+   bufferWithLenBytesSize = rmsgLen + bytes_read;
+*bufferWithLenBytes = new AllocatedBuffer(bufferWithLenBytesSize);
+
+if(*bufferWithLenBytes == NULL){
+return handleQryError(QRY_CLIENT_OUTOFMEM, 
getMessage(ERR_QRY_OUTOFMEM), NULL);
+}
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << 
"DrillClientImpl::readLenBytesFromSocket: Allocated and locked buffer: [ "
+  << *bufferWithLenBytes << ", 
size = " << bufferWithLenBytesSize << " ]\n";)
+
+// Copy the memory of bufWithLen into bufferWithLenBytesSize
+memcpy((*bufferWithLenBytes)->m_pBuffer, bufWithLen, 
LEN_PREFIX_BUFLEN);
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Copied bufWithLen into 
bufferWithLenBytes. "
+  << "Now reading data (rmsgLen - 
leftover) : " << (rmsgLen - leftover)
+  << std::endl;)
+
+// Read the entire data left from socket and copy to currentBuffer.
+ByteBuf_t b = (*bufferWithLenBytes)->m_pBuffer + LEN_PREFIX_BUFLEN;
+size_t bytesToRead = rmsgLen - leftover;
+
+while(1){
+bytes_read = this->m_socket.read_some(boost::asio::buffer(b, 
bytesToRead), error);
+if(error) break;
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Data Message: actual 
bytes read = " << bytes_read << std::endl;)
+if(bytes_read == bytesToRead) break;
+bytesToRead -= bytes_read;
+b += bytes_read;
+}
+} else {
+return handleQryError(QRY_INTERNAL_ERROR, 
getMessage(ERR_QRY_INVREADLEN), NULL);
+}
+
+return error ? handleQryError(QRY_COMM_ERROR, 
getMessage(ERR_QRY_COMMERR, error.message().c_str()), NULL)
+ : QRY_SUCCESS;
+}
+
+
+/*
+ *  Function to read entire RPC message from socket and decode it to 
InboundRpcMessage
+ *  Parameters:
+ *  _buf- in param  - Buffer containing the length bytes.
+ *  allocatedBuffer - out param - Buffer containing the length bytes 
and entire RPC message bytes.
+ *  msg - out param - Decoded InBoundRpcMessage from the 
bytes in allocatedBuffer
+ *  Return:
+ *  status_t- QRY_SUCCESS   - In case of success.
+ *  - 
QRY_COMM_ERROR/QRY_INTERNAL_ERROR/QRY_CLIENT_OUTOFMEM 

[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r115371963
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -854,75 +990,328 @@ void DrillClientImpl::waitForResults(){
 }
 }
 
-status_t DrillClientImpl::readMsg(ByteBuf_t _buf,
-AllocatedBufferPtr* allocatedBuffer,
+/*
+ *  Decode the length of the message from bufWithLen and then read entire 
message from the socket.
+ *  Parameters:
+ *  bufWithLen  - in param  - buffer containing the bytes 
which has length of the RPC message/encrypted chunk
+ *  bufferWithLenBytes  - out param - buffer pointer which points to 
memory allocated in this function and has the
+ *entire one RPC message / 
encrypted chunk along with the length of the message
+ *  lengthBytesLength   - out param - bytes of bufWithLen which 
contains the length of the entire RPC message or
+ *encrypted chunk
+ *  lengthDecodeHandler - in param  - function pointer with length 
decoder to use. For encrypted chunk we use
+ *lengthDecode and for plain RPC 
message we use rpcLengthDecode.
+ *  Return:
+ *  status_t- QRY_SUCCESS- In case of success.
+ *  - 
QRY_COMM_ERROR/QRY_INTERNAL_ERROR/QRY_CLIENT_OUTOFMEM - In cases of error.
+ */
+status_t DrillClientImpl::readLenBytesFromSocket(ByteBuf_t bufWithLen, 
AllocatedBufferPtr* bufferWithLenBytes,
+   uint32_t& lengthBytesLength, lengthDecoder lengthDecodeHandler) 
{
+
+uint32_t rmsgLen = 0;
+size_t bytes_read = 0;
+size_t leftover = 0;
+boost::system::error_code error;
+*bufferWithLenBytes = NULL;
+size_t bufferWithLenBytesSize = 0;
+
+bytes_read = (this->*lengthDecodeHandler)(bufWithLen, rmsgLen);
+lengthBytesLength = bytes_read;
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Length bytes = " << bytes_read 
<< std::endl;)
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Msg Length = " << rmsgLen << 
std::endl;)
+
+if(rmsgLen>0){
+leftover = LEN_PREFIX_BUFLEN - bytes_read;
+
+// Allocate a buffer for reading all the bytes in bufWithLen and 
length number of bytes
+   bufferWithLenBytesSize = rmsgLen + bytes_read;
+*bufferWithLenBytes = new AllocatedBuffer(bufferWithLenBytesSize);
+
+if(*bufferWithLenBytes == NULL){
+return handleQryError(QRY_CLIENT_OUTOFMEM, 
getMessage(ERR_QRY_OUTOFMEM), NULL);
+}
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << 
"DrillClientImpl::readLenBytesFromSocket: Allocated and locked buffer: [ "
+  << *bufferWithLenBytes << ", 
size = " << bufferWithLenBytesSize << " ]\n";)
+
+// Copy the memory of bufWithLen into bufferWithLenBytesSize
+memcpy((*bufferWithLenBytes)->m_pBuffer, bufWithLen, 
LEN_PREFIX_BUFLEN);
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Copied bufWithLen into 
bufferWithLenBytes. "
+  << "Now reading data (rmsgLen - 
leftover) : " << (rmsgLen - leftover)
+  << std::endl;)
+
+// Read the entire data left from socket and copy to currentBuffer.
+ByteBuf_t b = (*bufferWithLenBytes)->m_pBuffer + LEN_PREFIX_BUFLEN;
+size_t bytesToRead = rmsgLen - leftover;
+
+while(1){
+bytes_read = this->m_socket.read_some(boost::asio::buffer(b, 
bytesToRead), error);
+if(error) break;
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Data Message: actual 
bytes read = " << bytes_read << std::endl;)
+if(bytes_read == bytesToRead) break;
+bytesToRead -= bytes_read;
+b += bytes_read;
+}
+} else {
+return handleQryError(QRY_INTERNAL_ERROR, 
getMessage(ERR_QRY_INVREADLEN), NULL);
+}
+
+return error ? handleQryError(QRY_COMM_ERROR, 
getMessage(ERR_QRY_COMMERR, error.message().c_str()), NULL)
+ : QRY_SUCCESS;
+}
+
+
+/*
+ *  Function to read entire RPC message from socket and decode it to 
InboundRpcMessage
+ *  Parameters:
+ *  _buf- in param  - Buffer containing the length bytes.
+ *  allocatedBuffer - out param - Buffer containing the length bytes 
and entire RPC message bytes.
+ *  msg - out param - Decoded InBoundRpcMessage from the 
bytes in allocatedBuffer
+ *  Return:
+ *  status_t- QRY_SUCCESS   - In case of success.
+ *  - 
QRY_COMM_ERROR/QRY_INTERNAL_ERROR/QRY_CLIENT_OUTOFMEM 

[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r114442641
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -245,6 +264,64 @@ connectionStatus_t 
DrillClientImpl::sendSync(rpc::OutBoundRpcMessage& msg){
 }
 }
 
+/*
+ * Send handler for sending encrypted messages over wire. It encrypts the 
send buffer using wrap api provided by
+ * saslAuthenticatorImpl and then transmit the encrypted bytes over wire.
+ *
+ * Return:
+ *  connectionStatus_t  -   CONN_SUCCESS - In case of successful send
+ *  -   CONN_FAILURE - In case of failure to send
+ */
+connectionStatus_t DrillClientImpl::sendSyncEncrypted() {
+
+boost::system::error_code ec;
+
+// Split the encoded message into the rawWrapSendSize and then encrypt 
each chunk. Each encrypted chunk along with
+// its encrypted length in network order (added by Cyrus-SASL plugin) 
is sent over wire.
+const int wrapChunkSize = m_encryptionCtxt.getRawWrapSendSize();
+int numChunks = ceil((double)m_wbuf.size() / wrapChunkSize);
+int lengthToEncrypt = m_wbuf.size();
+int currentChunkLen = std::min(wrapChunkSize, lengthToEncrypt);
+uint32_t startIndex = 0, wrappedLen = 0;
+const char* wrappedChunk = NULL;
+std::stringstream errorMsg;
+
+// Encrypt and send each chunk
+while(numChunks > 0) {
+int wrapResult = m_saslAuthenticator->wrap(reinterpret_cast(m_wbuf.data() + startIndex),
+   currentChunkLen, 
&wrappedChunk, wrappedLen);
+if(SASL_OK != wrapResult) {
+errorMsg << "Sasl wrap failed while encrypting chunk of 
length: " << currentChunkLen << " , EncodeError: "
+ << wrapResult;
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << 
"DrillClientImpl::sendSyncEncrypted - " << errorMsg.str()
+  << " ,StartIndex: " << 
startIndex << ". ChunkNum: " << numChunks
--- End diff --

Did you want to print the chunk number or chunks remaining? ChunkNum should 
be Total chunks - Chunks Remaining. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r115369295
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -854,75 +990,328 @@ void DrillClientImpl::waitForResults(){
 }
 }
 
-status_t DrillClientImpl::readMsg(ByteBuf_t _buf,
-AllocatedBufferPtr* allocatedBuffer,
+/*
+ *  Decode the length of the message from bufWithLen and then read entire 
message from the socket.
+ *  Parameters:
+ *  bufWithLen  - in param  - buffer containing the bytes 
which has length of the RPC message/encrypted chunk
+ *  bufferWithLenBytes  - out param - buffer pointer which points to 
memory allocated in this function and has the
+ *entire one RPC message / 
encrypted chunk along with the length of the message
+ *  lengthBytesLength   - out param - bytes of bufWithLen which 
contains the length of the entire RPC message or
+ *encrypted chunk
+ *  lengthDecodeHandler - in param  - function pointer with length 
decoder to use. For encrypted chunk we use
+ *lengthDecode and for plain RPC 
message we use rpcLengthDecode.
+ *  Return:
+ *  status_t- QRY_SUCCESS- In case of success.
+ *  - 
QRY_COMM_ERROR/QRY_INTERNAL_ERROR/QRY_CLIENT_OUTOFMEM - In cases of error.
+ */
+status_t DrillClientImpl::readLenBytesFromSocket(ByteBuf_t bufWithLen, 
AllocatedBufferPtr* bufferWithLenBytes,
+   uint32_t& lengthBytesLength, lengthDecoder lengthDecodeHandler) 
{
+
+uint32_t rmsgLen = 0;
+size_t bytes_read = 0;
+size_t leftover = 0;
+boost::system::error_code error;
+*bufferWithLenBytes = NULL;
+size_t bufferWithLenBytesSize = 0;
+
+bytes_read = (this->*lengthDecodeHandler)(bufWithLen, rmsgLen);
+lengthBytesLength = bytes_read;
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Length bytes = " << bytes_read 
<< std::endl;)
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Msg Length = " << rmsgLen << 
std::endl;)
+
+if(rmsgLen>0){
+leftover = LEN_PREFIX_BUFLEN - bytes_read;
+
+// Allocate a buffer for reading all the bytes in bufWithLen and 
length number of bytes
+   bufferWithLenBytesSize = rmsgLen + bytes_read;
+*bufferWithLenBytes = new AllocatedBuffer(bufferWithLenBytesSize);
+
+if(*bufferWithLenBytes == NULL){
+return handleQryError(QRY_CLIENT_OUTOFMEM, 
getMessage(ERR_QRY_OUTOFMEM), NULL);
+}
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << 
"DrillClientImpl::readLenBytesFromSocket: Allocated and locked buffer: [ "
+  << *bufferWithLenBytes << ", 
size = " << bufferWithLenBytesSize << " ]\n";)
+
+// Copy the memory of bufWithLen into bufferWithLenBytesSize
+memcpy((*bufferWithLenBytes)->m_pBuffer, bufWithLen, 
LEN_PREFIX_BUFLEN);
+
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Copied bufWithLen into 
bufferWithLenBytes. "
+  << "Now reading data (rmsgLen - 
leftover) : " << (rmsgLen - leftover)
+  << std::endl;)
+
+// Read the entire data left from socket and copy to currentBuffer.
+ByteBuf_t b = (*bufferWithLenBytes)->m_pBuffer + LEN_PREFIX_BUFLEN;
+size_t bytesToRead = rmsgLen - leftover;
+
+while(1){
+bytes_read = this->m_socket.read_some(boost::asio::buffer(b, 
bytesToRead), error);
+if(error) break;
--- End diff --

Shouldn't EINTR (interrupted) be handled like a temporary failure, with a 
subsequent retry?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-5489) Unprotected array access in RepeatedVarCharOutput ctor

2017-05-08 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5489:
--

 Summary: Unprotected array access in RepeatedVarCharOutput ctor
 Key: DRILL-5489
 URL: https://issues.apache.org/jira/browse/DRILL-5489
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
Priority: Minor


Suppose a user runs a query of form:
{code}
SELECT columns[7] FROM `dfs`.`mycsv.csv`
{code}

Internally, this will create a {{PathSegment}} to represent the selected 
column. This is passed into the {{RepeatedVarCharOutput}} constructor where it 
is used to set a flag in an array of 64K booleans. But, while the code is very 
diligent of making sure that the column name is "columns" and that the path 
segment is an array, it does not check the array value. Instead:

{code}
for(Integer i : columnIds){
  ...
  fields[i] = true;
}
{code}

We need to add a bounds check to reject array indexes that are not valid: 
negative or above 64K.

While we are at it, we might as well fix another bit of bogus code:

{code}
for(Integer i : columnIds){
  maxField = 0;
  maxField = Math.max(maxField, i);
  ...
}
{code}

The code to compute maxField simply uses the last value, not the maximum value. 
This will be thrown off by a query of the form:

{code}
SELECT columns[20], columns[1] FROM ...
{code}

It may be that the code further up the hierarchy does the checks. But, if so, 
it should do the other checks as well. Leaving the checks incomplete is 
confusing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5490) RepeatedVarCharOutput.recordStart becomes corrupted on realloc

2017-05-08 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5490:
--

 Summary: RepeatedVarCharOutput.recordStart becomes corrupted on 
realloc
 Key: DRILL-5490
 URL: https://issues.apache.org/jira/browse/DRILL-5490
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
Priority: Minor


The class {{RepeatedVarCharOutput}}, used to read text data into the 
{{columns}} array, has a member called {{recordStart}} which is supposed to 
track the memory offset of the start of the record data in the {{columns}} 
VarChar array. However, the logic for the member is very wrong. First, it is 
initialized based on an uninitialized variable:

{{code}}
  public void startBatch() {
this.recordStart = characterDataOriginal;
...
loadVarCharDataAddress();
  }

  private void loadVarCharDataAddress(){
...
this.characterDataOriginal = buf.memoryAddress();
...
  }
{{code}}

Second, the class keeps track of actual memory addresses for the VarChar data. 
When data would exceed the buffer, memory is reallocated using the following 
method. But, this method *does not* adjust the {{recordStart}} member, which 
now points to a bogus memory address:

{code}
  private void expandVarCharData(){
vector.getDataVector().reAlloc();
long diff = characterData - characterDataOriginal;
loadVarCharDataAddress();
characterData += diff;
  }
{code}

Fortunately, like so many bugs in this class, these bugs are offset by the fact 
that the variable is never used in a way that cause obvious problems (else 
these issues would have been found sooner.) The only real use is:

{code}
  @Override
  public boolean rowHasData() {
return recordStart < characterData;
  }
{code}

Which, it turns out, is used only at EOF in {{TextReader.parseRecord()}}:

{code}
} catch(StreamFinishedPseudoException e) {
  // if we've written part of a field or all of a field, we should send 
this row.
  if (fieldsWritten == 0 && !output.rowHasData()) {
throw e;
  }
}
{code}

Thus, the bug will cause a failure only if:

* A batch is resized, and
* The new address is lower than the original address, and
* It is the last batch in the file.

Because of the low probability of hitting the bug, the priority is set to Minor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5491) NPE when reading a CSV file, with headers, but blank header line

2017-05-08 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5491:
--

 Summary: NPE when reading a CSV file, with headers, but blank 
header line
 Key: DRILL-5491
 URL: https://issues.apache.org/jira/browse/DRILL-5491
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Paul Rogers


See DRILL-5490 for background.

Try this unit test case:

{code}
FixtureBuilder builder = ClusterFixture.builder()
.maxParallelization(1);

try (ClusterFixture cluster = builder.build();
 ClientFixture client = cluster.clientFixture()) {
  TextFormatConfig csvFormat = new TextFormatConfig();
  csvFormat.fieldDelimiter = ',';
  csvFormat.skipFirstLine = false;
  csvFormat.extractHeader = true;
  cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat);
  String sql = "SELECT * FROM `dfs.data`.`csv/test7.csv`";
  client.queryBuilder().sql(sql).printCsv();
}
  }
{code}

The test can also be run as a query using your favorite client.

Using this input file:

{code}

a,b,c
d,e,f
{code}

(The first line is blank.)

The following is the result:

{code}
Exception (no rows returned): 
org.apache.drill.common.exceptions.UserRemoteException: 
SYSTEM ERROR: NullPointerException
{code}

The {{RepeatedVarCharOutput}} class tries (but fails for the reasons outlined 
in DRILL-5490) to detect this case.

The code crashes here in {{CompliantTextRecordReader.extractHeader()}}:

{code}
String [] fieldNames = ((RepeatedVarCharOutput)hOutput).getTextOutput();
{code}

Because of bad code in {{RepeatedVarCharOutput.getTextOutput()}}:

{code}
  public String [] getTextOutput () throws ExecutionSetupException {
if (recordCount == 0 || fieldIndex == -1) {
  return null;
}

if (this.recordStart != characterData) {
  throw new ExecutionSetupException("record text was requested before 
finishing record");
}
{code}

Since there is no text on the line, special code elsewhere (see DRILL-5490) 
elects not to increment the {{recordCount}}.  (BTW: {{recordCount}} is the 
total across-batch count, probably the in-batch count, {{batchIndex}}, was 
wanted here.) Since the count is zero, we return null.

But, if the author probably thought we'd get a zero-length record, and the 
if-statement throws an exception in this case. But, see DRILL-5490 about why 
this code does not actually work.

The result is one bug (not incrementing the record count), triggering another 
(returning a null), which masks a third ({{recordStart}} is not set correctly 
so the exception would not be thrown.)

All that bad code is just fun and games until we get an NPE, however.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5492) CSV with spaces for header uses spaces as field name

2017-05-08 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5492:
--

 Summary: CSV with spaces for header uses spaces as field name
 Key: DRILL-5492
 URL: https://issues.apache.org/jira/browse/DRILL-5492
 Project: Apache Drill
  Issue Type: Bug
Reporter: Paul Rogers
Priority: Minor


Consider the same test case as in DRILL-5491, but with a slightly different 
input file:

{code}
___
a,b,c
d,e,f
{code}

The underscores represent three spaces: use spaces in the real test.

In this case, the code discussed in DRILL-5491 finds some characters and 
happily returns the following array:

{code}
["   "]
{code}

The field name of three blanks is returned to the client to produce the 
following bizarre output:

{code}
2 row(s):

a
d
{code}

The blank line is normally the header, but the header here was considered to be 
three blanks. (In fact, the blanks are actually printed.)

Since the blanks were considered to be a field, the file is assumed to have 
only one field, so only the first column was returned.

The expected behavior is that spaces are trimmed from field names, so the field 
name list would be empty and a User Error thrown. (That is, it is confusing to 
the user why a blank line produces NPE, some produce the 
{{ExecutionSetupException}} shown in DRILL-5491, and some produce blank 
headings. Behavior should be consistent.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)