Re: REPEATED_CONTAINS

2016-03-29 Thread Jacques Nadeau
I think the best answer is to test it and share your findings.
Hypothesizing about performance in complicated systems is also suspect :)

That said, I'll make a guess...

In general, I would expect the flatten to be faster in your example since a
flatten without a cartesian is trivial operation and can be done in
vectorized fashion because of the shape of how data is held in memory. This
is different than how complex UDFs are written today (using the FieldReader
model). These UDFs are object-based execution, record by record. So,
vectorized and full runtime code generation

That being said, if you changed your code to be something more like [select
a,b,c,d,e,f,g, flatten(t.fillings) as fill], you might see the two be
closer together. This is because this would then require a cartesian copy
of all the fields abcdefg, which then have to be filtered out. In this
case, the extra cost of the copies might be more expensive than the object
overhead required for traversing the complex object structure.

In general, start with the methodology that works. If we don't see the
performance to satisfy your usecase, we can see if we can suggest some
things. (For example, supporting operation pushdowns that push through
FLATTEN would probably be very helpful.)



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, Mar 29, 2016 at 6:37 PM, Jean-Claude Cote  wrote:

> I've noticed drill offers a REPEATED_CONTAINS which can be applied to
> fields which are arrays.
>
> https://drill.apache.org/docs/repeated-contains/
>
> I have a schema stored in parquet files which contain a repeated field
> containing a key and a value. However such structures can't be queried
> using the REPEATED_CONTAINS. I was thinking of writing a user defined
> function to look through it.
>
> My question is: is it worth it? Will it be faster than doing this?
>
> {"name":"classic","fillings":[ {"name":"sugar","cal":500} ,
> {"name":"flour","cal":300} ] }
>
> SELECT flat.fill FROM (SELECT FLATTEN(t.fillings) AS fill FROM
> dfs.flatten.`test.json` t) flat WHERE flat.fill.name like 'sug%';
>
> Specifically what's the cost of using FLATTEN compared to iterating over
> the array right in a UDF?
>
> Thanks
> Jean-Claude
>


REPEATED_CONTAINS

2016-03-29 Thread Jean-Claude Cote
I've noticed drill offers a REPEATED_CONTAINS which can be applied to
fields which are arrays.

https://drill.apache.org/docs/repeated-contains/

I have a schema stored in parquet files which contain a repeated field
containing a key and a value. However such structures can't be queried
using the REPEATED_CONTAINS. I was thinking of writing a user defined
function to look through it.

My question is: is it worth it? Will it be faster than doing this?

{"name":"classic","fillings":[ {"name":"sugar","cal":500} ,
{"name":"flour","cal":300} ] }

SELECT flat.fill FROM (SELECT FLATTEN(t.fillings) AS fill FROM
dfs.flatten.`test.json` t) flat WHERE flat.fill.name like 'sug%';

Specifically what's the cost of using FLATTEN compared to iterating over
the array right in a UDF?

Thanks
Jean-Claude


[GitHub] drill pull request: DRILL-4550: Add support more time units in ext...

2016-03-29 Thread vkorukanti
GitHub user vkorukanti opened a pull request:

https://github.com/apache/drill/pull/453

DRILL-4550: Add support more time units in extract function

Calcite changes are pending in CALCITE-1177

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vkorukanti/drill DRILL-4550

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/453.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #453


commit ca223227cc44052b55e13a4b7525262ec4ec40f8
Author: vkorukanti 
Date:   2016-03-30T00:08:57Z

DRILL-4550: Add support more time units in extract function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4557) Make complex writer handle also scalars

2016-03-29 Thread Julien Le Dem (JIRA)
Julien Le Dem created DRILL-4557:


 Summary: Make complex writer handle also scalars
 Key: DRILL-4557
 URL: https://issues.apache.org/jira/browse/DRILL-4557
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Julien Le Dem


Currently complex writer can be used to write array or map but not scalar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4556) UDF with FieldReader parameter reading union type fails compilation

2016-03-29 Thread Julien Le Dem (JIRA)
Julien Le Dem created DRILL-4556:


 Summary: UDF with FieldReader parameter reading union type fails 
compilation
 Key: DRILL-4556
 URL: https://issues.apache.org/jira/browse/DRILL-4556
 Project: Apache Drill
  Issue Type: Bug
Reporter: Julien Le Dem


select foo(a) from mixed
where a is a union vector (say mixed is a json file where a is a string or an 
int)
Foo is a UDF that has one param defined as a FieldReader
the operator compilation fails as the field is produced as a UnionHolder 
instead of a FieldReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4555) JsonReader does not support nulls in lists

2016-03-29 Thread Julien Le Dem (JIRA)
Julien Le Dem created DRILL-4555:


 Summary: JsonReader does not support nulls in lists
 Key: DRILL-4555
 URL: https://issues.apache.org/jira/browse/DRILL-4555
 Project: Apache Drill
  Issue Type: Bug
Reporter: Julien Le Dem


{noformat}
  case VALUE_NULL:
throw UserException.unsupportedError()
  .message("Null values are not supported in lists by default. " +
"Please set `store.json.all_text_mode` to true to read lists 
containing nulls. " +
"Be advised that this will treat JSON null values as a string 
containing the word 'null'.")
  .build(logger);
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4472) Pushing Filter past Union All fails: DRILL-3257 regressed DRILL-2746 but unit test update break test goal

2016-03-29 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-4472.
--
   Resolution: Fixed
Fix Version/s: 1.7.0

It was resolved when DRILL-4476

> Pushing Filter past Union All fails: DRILL-3257 regressed DRILL-2746 but unit 
> test update break test goal
> -
>
> Key: DRILL-4472
> URL: https://issues.apache.org/jira/browse/DRILL-4472
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> While reviewing DRILL-4467, I discovered this test. 
> https://github.com/apache/drill/blame/master/exec/java-exec/src/test/java/org/apache/drill/TestUnionAll.java#L560
> As you can see, the test is checking that test name confirms that filter is 
> pushed below union all. However, as you can see, the expected result in 
> DRILL-3257 was updated to a plan which doesn't push the in clause below the 
> filter. I'm disabling the test since 4467 happens to remove what becomes a 
> trivial project. However, we really should fix the core problem (a regression 
> of DRILL-2746.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-4551: Implement new functions (cot, rege...

2016-03-29 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/452#discussion_r57819554
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctionHelpers.java
 ---
@@ -213,11 +213,39 @@ public static long getDate(DrillBuf buf, int start, 
int end){
 if (BoundsChecking.BOUNDS_CHECKING_ENABLED) {
   buf.checkBytes(start, end);
 }
-return memGetDate(buf.memoryAddress(), start, end);
+int[] dateFields = memGetDate(buf.memoryAddress(), start, end);
+return CHRONOLOGY.getDateTimeMillis(dateFields[0], dateFields[1], 
dateFields[2], 0);
   }
 
+  /**
+   * Takes a string value, specified as a buffer with a start and end and
+   * returns true if the value can be read as a date.
+   *
+   * @param buf
+   * @param start
+   * @param end
+   * @return true iff the string value can be read as a date
+   */
+  public static boolean isReadableAsDate(DrillBuf buf, int start, int end){
+if (BoundsChecking.BOUNDS_CHECKING_ENABLED) {
+  buf.checkBytes(start, end);
+}
+int[] dateFields = memGetDate(buf.memoryAddress(), start, end);
--- End diff --

Can we call getDate() directly here, and wrap with a try/catch block? The 
code seems identical to getDate(), except for the try/catch block. 
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4551: Implement new functions (cot, rege...

2016-03-29 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/452#discussion_r57818552
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DateTypeFunctions.java
 ---
@@ -40,6 +41,36 @@
 
 public class DateTypeFunctions {
 
+/**
+ * Function to check if a varchar value can be cast to a date.
+ *
+ * At the time of writing this function, several other databases were 
checked
+ * for behavior compatibility. There was not a consensus between 
oracle and
+ * Sql server about the expected behavior of this function, and 
Postgres
+ * lacks it completely.
+ *
+ * Sql Server appears to have both a DATEFORMAT and language locale 
setting
+ * that can change the values accepted by this function. Oracle 
appears to
+ * support several formats, some of which are not mentioned in the Sql
+ * Server docs. With the lack of standardization, we decided to 
implement
+ * this function so that it would only consider date strings that 
would be
+ * accepted by the cast function as valid.
+ */
+@SuppressWarnings("unused")
+@FunctionTemplate(name = "isdate", scope = 
FunctionTemplate.FunctionScope.SIMPLE, nulls=NullHandling.NULL_IF_NULL,
--- End diff --

Have you checked isdate() returns null for null input in other system like 
oracle? I thought it would return either true or false. 
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4554) Data type mismatch for union all with timestamp and date

2016-03-29 Thread Krystal (JIRA)
Krystal created DRILL-4554:
--

 Summary: Data type mismatch for union all with timestamp and date
 Key: DRILL-4554
 URL: https://issues.apache.org/jira/browse/DRILL-4554
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Krystal
Assignee: Sean Hsuan-Yi Chu


Calcite and drill execute different implicit cast when a union all query 
contains timestamp and date on both right and left hand side but in different 
order.

select col_tmstmp,col_date, col_boln from `prqUnAll_0_v` union all select 
col_date, col_tmstmp, col_boln from `prqUnAll_1_v`

limit 0: select * from (select col_tmstmp,col_date, col_boln from 
`prqUnAll_0_v` union all select col_date, col_tmstmp, col_boln from 
`prqUnAll_1_v`) t limit 0
limit 0: [col_tmstmp, col_date, col_boln]
regular: [col_tmstmp, col_date, col_boln]

limit 0: [DATE, DATE, BOOLEAN]
regular: [TIMESTAMP, TIMESTAMP, BOOLEAN]

limit 0: [columnNullable, columnNullable, columnNullable]
regular: [columnNullable, columnNullable, columnNullable]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4552) Treat decimal literals as Double when type inference is taking place

2016-03-29 Thread Sean Hsuan-Yi Chu (JIRA)
Sean Hsuan-Yi Chu created DRILL-4552:


 Summary: Treat decimal literals as Double when type inference is 
taking place
 Key: DRILL-4552
 URL: https://issues.apache.org/jira/browse/DRILL-4552
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning & Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Sean Hsuan-Yi Chu


In SQL standard, decimal literals (e.g., 1.2, 2.5, etc) are decimal types. 
However, currently, Drill always converts them to Double in DrillOptiq.

Since they will be converted as Double in execution anyway, at inference, we 
can treat them as Double to help determine the return types. 

(The current behavior is "not to do any inference if the operand is Decimal 
type").





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4553) Joins using views are not returning results.

2016-03-29 Thread Anton Fernando (JIRA)
Anton Fernando created DRILL-4553:
-

 Summary: Joins using views are not returning results.
 Key: DRILL-4553
 URL: https://issues.apache.org/jira/browse/DRILL-4553
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.6.0, 1.5.0
Reporter: Anton Fernando


I have the following three views:

create view view1 as select . from  where username=user;

create view view2 as select . from view1 as a,  as b where a.col1 = 
b.col1;

create view view3 as select . from view1 as a,  as b where a.col1 = 
b.col1;

A select * from each of these views works fine and returns the expected 
results.  A self join on view2 and view3 also works fine.  However when view2 
and view3 are joined on common keys there are no rows returned.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-4551: Implement new functions (cot, rege...

2016-03-29 Thread jaltekruse
GitHub user jaltekruse opened a pull request:

https://github.com/apache/drill/pull/452

DRILL-4551: Implement new functions (cot, regex_matches, split_part, …

…isdate)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaltekruse/incubator-drill 4551-new-functions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/452.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #452


commit 0166aab070aa7175175b4a35162fc2502ea3cb90
Author: Jason Altekruse 
Date:   2016-03-28T18:55:11Z

DRILL-4551: Implement new functions (cot, regex_matches, split_part, isdate)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4551) Add some missing functions that are generated by Tableau (cot, regex_matches, split_part, isdate)

2016-03-29 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4551:
--

 Summary: Add some missing functions that are generated by Tableau 
(cot, regex_matches, split_part, isdate)
 Key: DRILL-4551
 URL: https://issues.apache.org/jira/browse/DRILL-4551
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse
Assignee: Jason Altekruse


Several of these functions do not appear to be standard SQL functions, but they 
are available in several other popular databases like SQL Server, Oracle and 
Postgres.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Configuring User Authentication failed

2016-03-29 Thread Chunhui Shi
Hi Xueping,

The drill-module.conf should be put into the your jar file. The logic about
what jar to scan(thus to extract the class) during startup is based on if
there is a drill-module.conf in that jar file. Hope this helps.

Thanks,

Chunhui

On Tue, Mar 29, 2016 at 2:29 AM, xuepingy...@cienet.com.cn <
xuepingy...@cienet.com.cn> wrote:

> Hi,
> We installed drill-1.6 in distributed mode.
> We are trying to config a custom authenticator ,we do the steps as the
> document shows,but it failed.
> 1,Build the following Java file into a JAR file:
> package myorg.dept.drill.security;
>
> import java.io.IOException;
>
> import org.apache.drill.common.config.DrillConfig;
> import org.apache.drill.exec.exception.DrillbitStartupException;
> import org.apache.drill.exec.rpc.user.security.UserAuthenticationException;
> import org.apache.drill.exec.rpc.user.security.UserAuthenticator;
> import org.apache.drill.exec.rpc.user.security.UserAuthenticatorTemplate;
>
>
> /*
>  * Implement {@link
> org.apache.drill.exec.rpc.user.security.UserAuthenticator} for illustrating
> how to develop a custom authenticator and use it in Drill
>  */
> @UserAuthenticatorTemplate(type = "myCustomAuthenticatorType")
> public class MyCustomDrillUserAuthenticatorImpl implements
> UserAuthenticator {
>
>  public static final String USER_1 = "user1";
>  public static final String USER_2 = "user2";
>  public static final String PWD_1 = "pwd1";
>  public static final String PWD_2 = "pwd2";
>
>  /**
>   * Setup for authenticating user credentials.
>   */
>  @Override
>  public void setup(DrillConfig drillConfig) throws
> DrillbitStartupException {
>   // If the authenticator has any setup such as making sure authenticator
>   // provider servers are up and running or
>   // needed libraries are available, it should be added here.
>  }
>
>  /**
>   * Authenticate the given user and password combination.
>   *
>   * @param userName
>   * @param password
>   * @throws UserAuthenticationException
>   * if authentication fails for given user and password.
>   */
>  @Override
>  public void authenticate(String userName, String password)
>throws UserAuthenticationException {
>   System.out.println("==enter==");
>   if (!(USER_1.equals(userName) && PWD_1.equals(password))
> && !(USER_2.equals(userName) && PWD_2.equals(password))) {
>throw new UserAuthenticationException(
>  "custom failure message if the admin wants to show it to user");
>   }
>
>
>  }
>
>  /**
>   * Close the authenticator. Used to release resources. Ex. LDAP
>   * authenticator opens connections to LDAP server, such connections
>   * resources are released in a safe manner as part of close.
>   *
>   * @throws IOException
>   */
>  @Override
>  public void close() throws IOException {
>   // Any clean up such as releasing files/network resources should be done
>   // here
>  }
>
>
> }
> 2,Add the jar file to the path: /jars
> 3,Create a file named drill-module.conf with the following configuration
> code,and add it to :/jars
> drill {
> classpath.scanning {
>   packages += "myorg.dept.drill.security"
> }
>   }
> 4,modify the drill-override.conf,add the following configuration code in
> drill.exec block:
> security.user.auth {
> enabled: true,
> packages += "myorg.dept.drill.security",
> impl: "myCustomAuthenticatorType"
>}
> 5,restart the drill again.
> Then we find that we can't start the drill successfully.Here is the log:
> 2016-03-29 16:49:32,392 [main] ERROR
> o.a.d.e.r.u.s.UserAuthenticatorFactory - Failed to find the implementation
> of 'org.apache.drill.exe
> c.rpc.user.security.UserAuthenticator' for type 'myCustomAuthenticatorType'
> Can you give me some advice?
> Thank you very much.
>
>
> Best Wishes
> Xueping Yang
>
> xuepingy...@cienet.com.cn
>


[GitHub] drill pull request: First pass at test re-factoring

2016-03-29 Thread sudheeshkatkam
Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/135#issuecomment-203057745
  
@aleph-zero I see the last update was a few months back. I know @jaltekruse 
mentioned about this in the hangout a while back. What is the latest on this (I 
forget)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Drill Hangout Starting

2016-03-29 Thread Jacques Nadeau
Just a quick call today.

Attendees:

Vitalli
Arrina
Laurent
Jacques


Discussions around DESCRIBE SCHEMA & DESCRIBE TABLE where? Calcite or
Drill?

- Propose initially committing to Drill. Also open two bugs to move to
Calcite once DRILL-3993 is done.

Question around progress on DRILL-3993?

- Jacques and Arrina to both bump thread to figure out next steps.

Question: why Parquet only read, not write INT96 type? Should we add INT96
type?

- INT96 read so we can convert to Drill internal date type when data is
produced by Impala. Write isn't implemented because there is no current way
to tell Drill to output data in that format (as there is no concept of a
96bit integer inside Drill).
- Propose open discussion on Drill mailing list if desire adding type. Also
question of whether the issue is actually we need to enhance Hive to
support Parquet defined timestamp types for consumption. Jacques noted that
extra types can be expensive and Drill probably needs to deprecate types
instead of adding types.

Question: Why does drill have var16char?

- Vitalli making changes to remove var16char from Hive translation
- We should probably remove var16char from Drill v2

Short discussion around removing spill files, Vitalli to update PR to clean
up earlier than end of JVM.


thanks everyone who attended!

Jacques


--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, Mar 29, 2016 at 10:01 AM, Jacques Nadeau  wrote:

> https://plus.google.com/hangouts/_/dremio.com/drillhangout?authuser=0
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>


[GitHub] drill pull request: DRILL-4544: Improve error messages for REFRESH...

2016-03-29 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/448#issuecomment-203005531
  
Let's open a follow-up bug to move this to Calcite and get in Drill for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Drill Hangout Starting

2016-03-29 Thread Jacques Nadeau
https://plus.google.com/hangouts/_/dremio.com/drillhangout?authuser=0


--
Jacques Nadeau
CTO and Co-Founder, Dremio


[GitHub] drill-site pull request: DRILL-4409 - Add notice about Postgres ty...

2016-03-29 Thread Serge-Harnyk
GitHub user Serge-Harnyk opened a pull request:

https://github.com/apache/drill-site/pull/1

DRILL-4409 - Add notice about Postgres typing of literals

All in comments here
https://issues.apache.org/jira/browse/DRILL-4409


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Serge-Harnyk/drill-site asf-site

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill-site/pull/1.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: Drill 3878

2016-03-29 Thread magpierre
GitHub user magpierre opened a pull request:

https://github.com/apache/drill/pull/451

Drill 3878

Please review my fix for JIRA DRILL-3878 provide XML support for Apache 
Drill.
The fix utilizes the existing support for JSON by converting XML to JSON 
using a simple SAX parser built for the purpose.
The parser tries to produce acceptable JSON documents that are then fed 
into the JSONRecordReader for futher processing.

To add xml support into Apache Drill, please include the built package to 
3rdparty folder of the built Apache Drill environment, and start.
Add:

"xml": {
  "type": "xml",
  "extensions": [
"xml"
  ],
  "keepPrefix": true
}

to the type section in dfs 
(keepPrefix = false will remove namespace from tags in Apache Drill since 
namespace can be named differently between documents and are not really part of 
the tagname)

The parser tries to be nice to Drill / JSON Reader by avoiding mixing 
types, arranging recurring values in arrays, and by removing empty elements. 
This in order to minimize the amount of JSON errors due to the different nature 
of XML and Drill.

Convention in JSON
Attributes are named using convetiion @ and then the attribute name and 
store simple values.
All other objects are stored as objects with a #value field.
This is somewhat conforming with Apache Spark XML, but I need to store all 
values in objects in order to avoid as many map of different type problems as 
possible.

Current limitations:
DTD tags are currently not liked. 
Schema is not validated against XSD's.

Also: SInce I am not a Drill Developer, I might have broken all rules 
possible of syntax, format, layout, test frameworks, as well as how to submit 
pull requests. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/magpierre/drill DRILL-3878

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/451.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #451


commit 844f34a16e75719535ff94c54d5337746ea18c20
Author: MPierre 
Date:   2015-11-05T14:42:06Z

Initial commit

XML support in Apache Drill

commit 592b3af06c2ff45198136577561f2ec1f7caaee0
Author: MPierre 
Date:   2015-11-05T21:21:42Z

Fixed some minor outstanding bugs

EasyRecordReader have a new field userName, and I forgot to change
jsonProcessor to protected from private.

commit 8fad811edab43d3499b41bb66cb419248d11208f
Author: MPierre 
Date:   2015-11-09T08:59:08Z

Merge remote-tracking branch 'apache/master' into DRILL-3878

commit 38f4884fe9b8456c1cde5de44c1e54177301a974
Author: MPierre 
Date:   2016-03-16T11:33:15Z

Syncing to latest release of drill

commit 909c5dec8bdb01bfe0ed358ebc64c959785738df
Author: MPierre 
Date:   2016-03-16T11:34:10Z

syncing to latest release of drill

commit 597d9657d613fa35df2c10dff23681545b13e531
Author: MPierre 
Date:   2016-03-18T08:55:51Z

Cleaned up deliver

Cleaned up the output generated by the SAX Parser, and removed all
unnecessary code.

commit 0cfaa31ab9af89833417288a290d21d0ce88c4ac
Author: MPierre 
Date:   2016-03-18T10:29:51Z

Merge remote-tracking branch 'apache/master' into DRILL-3878

commit aaaff05eb921125ad64854c89c179292c4441fb7
Author: MPierre 
Date:   2016-03-24T13:05:53Z

Adjusted output from Parser to fit Drill better

I have adjusted the SAX parser to produce JSON that Drill likes. Among
the things corrected is to remove empty objects from the tree built.
And to consolidate repeating values in arrays.

commit ba19a356d850224c01b9e807183377b46cf7e545
Author: MPierre 
Date:   2016-03-24T13:10:57Z

Fixed small typo

commit 8ba6705be42c7847d469611ab070b869e0c76d8c
Author: MPierre 
Date:   2016-03-24T21:17:30Z

Further enhancements of the output format to fit Drill

commit e2273f13b8e0136a33c1576c4667f16e23e1631c
Author: MPierre 
Date:   2016-03-24T21:22:41Z

Removed comment

commit c1b6ff8375a7e3c8161167d1a5f2b34ba165e750
Author: MPierre 
Date:   2016-03-29T12:48:53Z

Merge remote-tracking branch 'apache/master' into DRILL-3878




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a 

Re: Failure Behavior

2016-03-29 Thread John Omernik
Makes sense Steven. Thanks.  I see what you are saying about complication
and overhead.

On Mon, Mar 28, 2016 at 10:08 PM, Steven Phillips  wrote:

> If a fragment has already begun execution and sent some data to downstream
> fragments, there is no way to simply restart the failed fragment, because
> we would also have to restart any downstream fragments that consumed that
> output, and so on up the tree, as well as restart any leaf fragments that
> fed into any of those fragments. This is because we don't store
> intermediate results to disk.
>
> The case where I think it would even be possible would be if a node died
> before sending any data downstream. But I think the only way to be sure of
> this would be to poll all of the downstream fragments and verify that no
> data from the failed fragment was ever received. I think this would add a
> lot of complication and overhead to Drill.
>
> On Sat, Mar 26, 2016 at 10:03 AM, John Omernik  wrote:
>
> > Thanks for the responses.. So, even if the drillbit that died wasn't the
> > foreman the query would fail? Interesting... Is there any mechanism for
> > reassigning fragments? *try harder* so to speak?  I guess does this play
> > out too if I have a query and say something on that node caused a
> fragment
> > to fail, that it could be tried somewhere else... So I am not trying to
> > recreate map reduce in Drill (although I am sorta asking about similar
> > features), but in a distributed environment, what is the cost to allow
> the
> > foremen to time out a fragment and try again elsewhere. Say there was a
> > heart beat sent back from the bits running a fragment, and if the
> heartbeat
> > and lack of results exceeded 10 seconds, have the foremen try again
> > somewhere else (up to X times configured by a setting).  I am just
> curious
> > here for my own knowledge what makes that hard in a system like Drill.
> >
> > On Sat, Mar 26, 2016 at 10:47 AM, Abdel Hakim Deneche <
> > adene...@maprtech.com
> > > wrote:
> >
> > > the query could succeed is if all fragments that were running on the
> > > now-dead node already finished. Other than that, the query fails.
> > >
> > > On Sat, Mar 26, 2016 at 4:45 PM, Neeraja Rentachintala <
> > > nrentachint...@maprtech.com> wrote:
> > >
> > > > As far as I know, there is no failure handling in Drill. The query
> > dies.
> > > >
> > > > On Sat, Mar 26, 2016 at 7:52 AM, John Omernik 
> > wrote:
> > > >
> > > > > With distributed Drill, what is the expected/desired bit failure
> > > > behavior.
> > > > > I.e. if you are running, and certain fragments end up on a node
> with
> > a
> > > > bit
> > > > > in a flaky state (or a bit that suddenly dies).  What is the
> desired
> > > and
> > > > > actual behavior of the query? I am guessing that if the bit was
> > > foreman,
> > > > > the query dies, I guess that's unavoidable, but if it's just a
> > worker,
> > > > does
> > > > > the foreman detect this and reschedule the fragment or does the
> query
> > > die
> > > > > any way?
> > > > >
> > > > > John
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> > > >
> > >
> >
>


[GitHub] drill pull request: DRILL-3317: when ProtobufLengthDecoder couldn'...

2016-03-29 Thread adeneche
Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/446#discussion_r57676038
  
--- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/ProtobufLengthDecoder.java ---
@@ -82,15 +79,7 @@ protected void decode(ChannelHandlerContext ctx, ByteBuf 
in, List out) t
 } else {
   // need to make buffer copy, otherwise netty will try to refill 
this buffer if we move the readerIndex forward...
   // TODO: Can we avoid this copy?
-  ByteBuf outBuf;
-  try {
-outBuf = allocator.buffer(length);
-  } catch (OutOfMemoryException e) {
-logger.warn("Failure allocating buffer on incoming stream due 
to memory limits.  Current Allocation: {}.", allocator.getAllocatedMemory());
-in.resetReaderIndex();
-outOfMemoryHandler.handle();
-return;
-  }
+  ByteBuf outBuf = allocator.buffer(length);
--- End diff --

@jacques-n can you confirm it's indeed the case ? thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---