[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2017-11-11 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/518
  
Further testing revealed limitations in the underlying Jackson parser. That 
parser will not recover from other errors such as:
```
{ a: }
```
See [DRILL-5953](https://issues.apache.org/jira/browse/DRILL-5953) for 
details.


---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-10-14 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/518
  
+1. Looks like there has been enough review and there is good enough reason 
to merge this in


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-10-08 Thread kfaraaz
Github user kfaraaz commented on the issue:

https://github.com/apache/drill/pull/518
  
The below JSON is invalid, due to presence of duplicate key 'key'. Today 
Drill returns a DATA_READ error, does your proposed fix handle this case too ?
[root@centos-01 ~]# cat f1.json
{"key":"string", "key":123, "key":[1,2,3], "key":true, "key":false, 
"key":null, "key":{"key2":"b"}}

Error returned by Drill 1.9.0

0: jdbc:drill:schema=dfs.tmp> select * from `f1.json`;
Error: DATA_READ ERROR: Error parsing JSON - You tried to write a BigInt 
type when you are using a ValueWriter of type NullableVarCharWriterImpl.

File  /tmp/f1.json
Record  1
Fragment 0:0

[Error Id: 06411bc5-2d59-4681-a84f-3f49086e18c0 on centos-01.qa.lab:31010] 
(state=,code=0)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-10-07 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
I have a similar data set checked in. Hope that is good enough.


On Fri, Oct 7, 2016 at 10:00 AM, Paul Rogers 
wrote:

> Base data set:
>
> { "p": 1, "a": { "x": 10, "y": 20, "z": 30 }, "b": 50, "c": 60 }
> { "p": 2, "a": { "x": 11, "y": 21, "z": 31 }, "b": 51, "c": 61 }
> { "p": 3, "a": { "x": 12, "y": 22, "z": 32 }, "b": 52, "c": 62 }
> { "p": 4, "a": { "x": 13, "y": 23, "z": 33 }, "b": 53, "c": 63 }
> { "p": 5, "a": { "x": 14, "y": 24, "z": 34 }, "b": 54, "c": 64 }
> { "p": 6, "a": { "x": 15, "y": 25, "z": 35 }, "b": 55, "c": 65 }
>
> Create various errors:
>
> { "p": 2x, "a": { "x": 11, "y": 21, "z": 31 }, "b": 51, "c": 61 }
>
> and
>
> { "p": 2, "a": { "x": 11x, "y": 21, "z": 31 }, "b": 51, "c": 61 }
>
> And so on.
>
> In running the tests, I consistently saw that only the second (bad) row
> was omitted. Other rows properly appeared, and no partial row appeared.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-10-07 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/518
  
Base data set:

{ "p": 1, "a": { "x": 10, "y": 20, "z": 30 }, "b": 50, "c": 60 }
{ "p": 2, "a": { "x": 11, "y": 21, "z": 31 }, "b": 51, "c": 61 }
{ "p": 3, "a": { "x": 12, "y": 22, "z": 32 }, "b": 52, "c": 62 }
{ "p": 4, "a": { "x": 13, "y": 23, "z": 33 }, "b": 53, "c": 63 }
{ "p": 5, "a": { "x": 14, "y": 24, "z": 34 }, "b": 54, "c": 64 }
{ "p": 6, "a": { "x": 15, "y": 25, "z": 35 }, "b": 55, "c": 65 }

Create various errors:

{ "p": 2x, "a": { "x": 11, "y": 21, "z": 31 }, "b": 51, "c": 61 }

and

{ "p": 2, "a": { "x": 11x, "y": 21, "z": 31 }, "b": 51, "c": 61 }

And so on.

In running the tests, I consistently saw that only the second (bad) row was 
omitted. Other rows properly appeared, and no partial row appeared.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-10-07 Thread zfong
Github user zfong commented on the issue:

https://github.com/apache/drill/pull/518
  
@paul-rogers - since you had concerns about particular test cases, and it 
looks like you've confirmed that those are non-issues, would it make sense for 
you to share those with @ssriniva123 and he can then include them as unit tests 
with this pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/518
  
Ran some tests. The results look good. In particular, files with nested 
structures produced the correct results. Since it was the nested structure case 
that had me a bit worried, looks like the code is good to go.

+1 (non-binding)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-27 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
Paul,
The code you have listed is semantically equivalent to that of what I 
already I have submitted for pull and will not solve handling of all malformed 
json records. Also the code for reporting the 
error records is working correctly as long as is it is reported by the 
Parser correctly.

As I explained earlier the JSON parser is not just a simple tokenizer, it 
keeps track of internal state,
hence the issue. SERDE's in hive etc work because they  are record oriented 
with clean record demarkations using a new line.

One solution is to submit a patch to jackson parser to expose a method to 
skip to new line in the
event of a parsing exception. This can be parametrized so that behavior can 
customized.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-27 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/518
  
The open question was how we can discard a partly-built record during 
recovery. As far as I can tell (veterans, please correct me), the 
JSONRecordReader keeps track of the record count. So, all we have to do is not 
increment the count when we want to discard a record. Look in

JSONRecordReader.next( )
  ...
  outside: while(recordCount < DEFAULT_ROWS_PER_BATCH) {
writer.setPosition(recordCount); // Sets the position for the next 
read.
write = jsonReader.write(writer); // Write the record. We can catch 
errors
  // and recover here??
 ...
  recordCount++; // Don't do this on a bad record
  ...
  writer.setValueCount(recordCount); // The record reader controls the 
record count.

This seems to show the elements of a solution:

1. Try to read the record.
2. If a failure occurs, catch it here and clean up, as in the previous post.
3. Don't increment the record count. We reuse the current one on the next 
record read.

Now the only open question is how we clean up the in-flight record in case 
some columns are not present in the next record. Anyone know how to set a 
vector position to null (for optional) default value (for required) or 
zero-length (for repeated)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-26 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/518
  
As it turns out, the sample code shown was actually tested with a stock 
Jackson JSON parser: it does work. No parser changes are needed.

The issue is not whether we can make the parser do what is needed: the code 
posted in the comment above demonstrated a solution.

The issue is how we incorporate that code into the JSON parser to clean up 
partial records and prevent schema changes. When I have time, I'll investigate 
that question in greater depth.

IMHO, without a proper fix, we should simply state that Drill does not 
support malformed JSON. If an input file might be incorrect, run it though a 
clean-up step before allowing Drill to scan it. Otherwise, we are opening the 
door to many hard-to-resolve bugs when people ask Drill to scan corrupt JSON: 
the result, without a proper fix, would be undefined -- which is worse than the 
current behavior that simply fails the scan with an error.

Let's follow up again after I (or someone) has had a chance to figure out 
if we can undo a partially built record. If we can do that, then we've got a 
path to a clean solution: recover the parser (as shown earlier) and discard the 
in-flight record (as we need to research.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-26 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
There is not much to do except change the JSON parser to support this 
functionality.

- Indicate to the parser that a current record terminates when it 
encounters a \n (Of course
this then assumes that druid also aligns to record separators using a new 
line).
This is a change to make to the jackson parser.

- Right now the current code works for all the standard cases except one 
case where the inner 
most sub-structure within a JSON is malformed.

Given that this is a great recipe for approximation algorithms , I am 
requesting this change to be pulled in.

If need be we can work on change to the jackson parser using a different 
request.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-21 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/518
  
Looks like you are right; the JsonParser is more than a simple tokenizer.

We're not the first to try this: 
http://stackoverflow.com/questions/37511496/recover-from-malformed-json-with-jackson
 (no answer)

I tried an experiment and found that you are on the right track: the way 
you are using the JsonParser can be extended to ignore input until the start of 
the next object. A quick demonstration:

private static void recover(JsonParser parser) throws IOException {
  for ( ; ; ) {
JsonToken token;
try {
  token = parser.nextToken();
} catch( JsonParseException e ) { continue; }
if ( token == null ) return;
if ( token != JsonToken.END_OBJECT ) { continue; }
token = parser.nextToken();
if ( token == null ) return;
if ( token == JsonToken.START_OBJECT ) { return; }
  }
}

Basically, we keep reading tokens, and ignoring errors, until we 
successfully find the } { pair.

As we discussed before, to use the above in Drill, we have to discard the 
partly-built record, and start reading the next record assiming the parser is 
positioned **after** the START_OBJECT ("{") token, which we've already 
consumed. That should be simple.

Still, to do proper recovery, we have to discard the partly-built JSON 
record. I've not looked into how to do that. If we don't do that, we return the 
bogus partly-built record. Worse, if we recover by trying to build a new 
record, we create more partly-built records, but with a different schema, 
possibly triggering a schema change event when not really necessary.

Any ideas for how to solve that problem?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-19 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
Apologize for getting back on this thread late, got tied up with some 
issues@work.

Paul,
The json parser is not just a tokenizer, it keeps track of the JSON 
structure and understands various aspects of it like root, array/objectcontext 
and all parsing is done under that context.

- we cannot keep track of {} accurately - For eg: The counting json 
processor does a parser. skipChildren which tries to skip to the end of the 
JSON, but this can rollover to next line when
there is a malformed JSON in the bottom most json sub object - see example 
below (missing " in last json structure). This is similar behavior with the 
JsonReader.

{"balance": 1000.0,"num": 100,"is_vip": true,"name": 
"foo3","curr":{"denom":"pound","test":{"value  :false}}}

- One possible solution is to rewind the input source to reset the stream 
(which is not recommended and there is no guarentee that all streams support 
mark/reset semantics.

Given where we are, I think the solution proposed works perfect for almost 
all malformed JSON's.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-13 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/518
  
Poking around in the code a bit more, it looks like we rely on the Jackson 
JsonParser which delivers a stream of tokens. Here's the code in JsonReader:

public ReadState write(ComplexWriter writer) throws IOException {
  JsonToken t = parser.nextToken();

As you pointed out, white space is consumed internally to the tokenizer. 
But we do get JSON tokens ({, }, identifier, number, etc.) Given this, you can 
scan ahead to find the close-open bracket pattern.

The trick seems to solve two problems. First, how do we handle tokens: The 
answer seems to be a few lines down:

ReadState readState = writeToVector(writer, t);

Given a start token, we write to vector. writeToVector is a 
recursive-descent parser that consumes tokens and does the right thing. Here, 
it accepts only a top-level object, rejecting all other tokens. For an object, 
we call writeDataSwitch (funny name, that) which recursively writes fields, 
objects, lists and the rest.

Since this is a recursive-descent parser, JSON structure is represented by 
the call stack. To bail out of an invalid parse, we have to unwind the stack, 
which is what an exception does. So, that part may be good.

The next step is how to get the tokenizer (JsonParser) pointed at the right 
point so we can try to read the next object. That is where we need to consume 
tokens until we find the end-start bracket pair.

But, note that we have now consumed the start bracket, so we can't read it 
again when we gain call writeToVector. Checking the code, the JsonParser has no 
unget( ) method, unfortunately. To work around that, we need to "push" the 
token back onto the input. (Either actually doing so, or by having internal 
state that says that we've already read the open bracket.) 

We also have to ask if the JSON parser keeps track of parse state. Looking 
at the code, it seems that JsonParser is really just a tokenizer: it has not 
state about the JSON structure. (Would be good to run a small test to verify 
this observation.)

By the way, the JSON parser class has lots of good stuff already there. For 
example, the parser itself will keep track of line numbers and file locations. 
Perhaps we can use that when reporting error positions.

The last bit is that we've been building up a record as we parser JSON. If 
we fail part way thorugh, we've got a half-build record. Again, here I'm a bit 
hazy on what Drill can do. Can we "unwind" the current record? Can we mark the 
record as one to ignore (with a select vector?) Or, do we live with the 
half-build record? Throwing away the half-built record would be best, if we can 
do it.

All that said, how much of the above does your proposed code change handle? 
What other parts might still need to be added?

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-13 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
Paul,
Thanks for taking the time out in writing out a detailed email. Here are 
some of my thoughts.

- Drill uses the com.fasterxml.jackson.core.json.UTF8StreamJsonParser for 
parsing of JSON records. This parser does not rely on line delimiters for 
record separators but instead uses
the JSON structure as a natural way to signal End of record (EOR). There 
are methods internal
to the parser which check for line feeds but is not exposed to callers.

- The CountingJsonReader uses the parser.skipChildren() method to skip the 
rest of the children for this record, hence it is not possible to accurately 
count and match the no of braces to cleanly skip that bad record.

- One thought is to tap the inputsource of the parser on an exception 
condition, but is not
encouraged.

My thought process was exactly along the lines you have been thinking. On 
an exception scenario the code attempts to locate a closing bracket(}) followed 
by a opening bracket ({).
This is what is being done in the BaseJsonProcessor.processJSONException 
method. Please note that it works in all cases except when we do not have 
proper brackets to signify end of a JSON record. 

Hope this explanation helps clarify.









---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-12 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/518
  
Upon reflection, it seems that newline is not an adequate marker to 
separate JSON records. Many of our samples have internal newlines. If a newline 
appears inside the JSON record, then we are subject to the same incorrect 
recovery as illustrated with the "a, x, bar, y" example in the earlier comment.

Further, if the JSON tokenizer is like most, it probably discards 
whitespace, not returning EOL as a token.

So, it seems that the best (or only) option is to scan for the "} {" pair. 
This requires two specific improvements:

* A "token discarder" that uses a state machine to look for the "} {" 
pairs, and
* An indirection around the get-token method so we can push the "{" token 
back onto the input.

These changes, along with the pseudo-code shown earlier may provide as good 
a solution as we can get. (Phrased that way because some errors will cause two 
records to be discarded, as explained earlier.) Combine that with the options 
and error reporting from the original pull request and we are probably pretty 
close.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-12 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/518
  
Thanks much for your contribution! Sorry it is taking a while to review the 
code.

I've read over the code changes a couple of times. The options part is 
fine. However, the parser part seems a bit too complex and may not handle all 
cases of interest. I worry that the code may only handle the error shown in the 
test input file: a missing close quote:

{ "foo" : "bar }
{ "foo" : "mumble" }

To do a more general recovery, we have to have some known good recovery 
point. Pure JSON is not suited to recovery. Fortunately, the JSON we read is 
not true JSON, rather it is a file of JSON objects. The file-of-json-object 
structure offers two possible recovery points.

If we require that each JSON record start on a new line, then we can use 
newline as our recovery point:

{ "foo" :: 10 }
{ "foo" : 20 }

In the above, the first record is badly formed. But, we know where the next 
record begins because of the newline separator.

It seems that the code may be using the above approach. But, since the only 
test case is a missing close quote, the parser will "naturally" read all tokens 
up to EOL. But, the implementation seems to not handle the double-colon 
example: the paser will fail on the second token. When resuming, the parser 
will see another colon and fail again. Maybe the parser would eventually fail 
enough to find the EOL. But, we can construct cases where things would fail:

{ "foo" :: 10, "x": { "bar", 20, "mumble" } }

If we don't discard tokens to the newline, the parser may decide that { 
"bar" ... starts a new valid record, with the result of causing a schema change 
when not needed.

The message here is that, on error, we must discard tokens to EOL, we can't 
just try to resume parsing at the point of failure.

If, however, we can't count on the newline, then we need some other 
syntactic trick. Perhaps we can look for "}/s*{" (that is, close bracket, 
optional white space, open bracket):

{ "foo" :: 
{ "foo" : 20 }
{ "foo" : 30 }

Here, we'd actually discard both the first and second records because we'd 
only find the recovery patttern between the second and third records. (The "} 
{" pattern can never occur inside a JSON object, it would have to be "}, {" 
instead.)

Using EOL as delimiter is easiest: we don't need lookahead. But, we require 
that a newline separate records. (Newlines within records can be handled, but 
we'll omit that for now...)

Using "}/s*{" as the recovery is a bit harder: we need to read ahead by one 
token, then push the token back on the input.

If we do the above, then the reader algorithm should look something like 
this:

while ( more records to read ) {
try {
read the record
} catch ( parse exception ) {
read and discard tokens until we get to an EOL (or the "}/s*{")
reset the value vector pointer back one step to discard any 
partially loaded values.
}
}

The above can be shown to be correct by working through a simple state 
machine and set of examples.

Alternatively, it might help to include here (or in comments in code) an 
explanation of the proposed recovery mechanism so it is easier for reviewers to 
verify correctness.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-12 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
Folks,
I have tests on my local drill setup with various combinations of invalid 
json format. The only caveat is that some records may be skipped if JSON is not 
properly delimitted by }. Can this be reviewed for next release ? I can work on 
the documentation.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-08-23 Thread Subbu Srinivasan
Thanks Jason.

I am in the process of doing more unit testing, corner case testing etc
before submitting for final review.

Thanks
Subbu

On Mon, Aug 22, 2016 at 9:51 AM, jaltekruse  wrote:

> Github user jaltekruse commented on the issue:
>
> https://github.com/apache/drill/pull/518
>
> After this was discussed at the Hangout a few weeks back, I had been
> thinking about it more.
>
> My initial request was for warnings to be returned along with query
> results. Some initial work was posted last fall to add warnings to the RPC
> protocol, but unfortunately it was not brought to completion. These kinds
> of warnings can be received by JDBC and ODBC sources and many tools show
> them to users.
>
> https://github.com/apache/drill/pull/263
>
> In the hangout we discussed that this changeset is currently logging
> when part of a file is ignored. I don't believe that there is currently an
> expectation that users should have to grep through a log file to find out
> any additional information about the execution of a successful query, the
> log files are there for admins to debug system issues.
>
> I still think it would be good to have a way to remove the need to
> look through the log file to see if this behavior was used in a query, but
> as there wasn't a lot of concern expressed by others when we discussed it,
> I'm changing my vote to a +0.
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-08-22 Thread jaltekruse
Github user jaltekruse commented on the issue:

https://github.com/apache/drill/pull/518
  
After this was discussed at the Hangout a few weeks back, I had been 
thinking about it more.

My initial request was for warnings to be returned along with query 
results. Some initial work was posted last fall to add warnings to the RPC 
protocol, but unfortunately it was not brought to completion. These kinds of 
warnings can be received by JDBC and ODBC sources and many tools show them to 
users.

https://github.com/apache/drill/pull/263

In the hangout we discussed that this changeset is currently logging when 
part of a file is ignored. I don't believe that there is currently an 
expectation that users should have to grep through a log file to find out any 
additional information about the execution of a successful query, the log files 
are there for admins to debug system issues.

I still think it would be good to have a way to remove the need to look 
through the log file to see if this behavior was used in a query, but as there 
wasn't a lot of concern expressed by others when we discussed it, I'm changing 
my vote to a +0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-08-08 Thread Jason Altekruse
Hey Parth and Subbu,

Sorry for missing the last message on this thread, I will be able to attend
the hangout tomorrow to discuss my concern. As I had said previously, I am
mostly trying to make sure that there is agreement about the impact of this
change on user behavior and expectations before we merge.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Thu, Aug 4, 2016 at 5:11 PM, Parth Chandra  wrote:

> Hi Subbu,
>
>   Yes we can discuss this on the next hangout. If Jason is able to attend
> we can discuss some way to address his concern.
>
> Parth
>
> On Wed, Aug 3, 2016 at 10:24 AM, Subbu Srinivasan  >
> wrote:
>
> > Hi Folks,
> > When can we discuss this feature? Would next hangout be appropriate?
> >
> > Thanks
> > Subbu
> >
> > On Mon, Jul 25, 2016 at 10:20 AM, Subbu Srinivasan <
> > ssriniva...@zscaler.com>
> > wrote:
> >
> > > This mechanism falls in line with other JSON processing similar to
> > serde's
> > > with Hive, UDF's enabled at global level will apply to all users and is
> > > outlined using documentation.
> > >
> > >
> > > What is your stance if we move to the JSONFormatPlugin?
> > >
> > > On Fri, Jul 15, 2016 at 2:08 PM, jaltekruse 
> wrote:
> > >
> > >> Github user jaltekruse commented on the issue:
> > >>
> > >> https://github.com/apache/drill/pull/518
> > >>
> > >> I don't think we should merge this without a mechanism to return a
> > >> warning to the user to tell them at least that some data was ignored,
> > and
> > >> ideally some indication of how much data was discarded. While I do
> > >> understand this is not the default behavior, I think there is still
> too
> > >> high of a risk that an admin could set this at a global level and
> users
> > >> would be unaware of some of their data being discarded.
> > >>
> > >> I am willing to discuss the benefits of merging this before such a
> > >> system exists, but until this issue has been thoroughly evaluated I am
> > -1
> > >> on the change.
> > >>
> > >> One improvement you could make to the current implementation is
> > >> moving the option to the format plugin instead of the system/session
> > list.
> > >> This enables users to include setting the option in there query with
> the
> > >> "table with options" syntax that was added last fall. We already have
> a
> > >> JIRA open for moving the all_text_mode and read_numbers_as_double
> > options
> > >> to this location, because it doesn't really make sense to change query
> > >> results based on session state. Unfortunately this change does not
> > >> completely remove my initial concern, because not all users can modify
> > or
> > >> see the storage plugins in the case when web UI security is enabled.
> > >> Non-admin users in these cases could be surprised by this behavior.
> > >>
> > >> For examples of how this is done, you can look at the text plugin
> > >> config, you would just need to add these options as properties to the
> > json
> > >> config which is currently mostly empty.
> > >>
> > >> https://github.com/apache/drill/blob/master/exec/java-
> > exec/src/main/java/org/apache/drill/exec/store/easy/json/
> > JSONFormatPlugin.java#L93
> > >>
> > >>
> > >> https://github.com/apache/drill/blob/master/exec/java-
> > exec/src/main/java/org/apache/drill/exec/store/easy/text/
> > TextFormatPlugin.java#L135
> > >>
> > >> Select with options: https://issues.apache.org/
> > jira/browse/DRILL-4047
> > >> Jira for moving the existing options:
> > >> https://issues.apache.org/jira/browse/DRILL-4206
> > >>
> > >>
> > >> ---
> > >> If your project is set up for it, you can reply to this email and have
> > >> your
> > >> reply appear on GitHub as well. If your project does not have this
> > feature
> > >> enabled and wishes so, or if the feature is enabled but not working,
> > >> please
> > >> contact infrastructure at infrastruct...@apache.org or file a JIRA
> > ticket
> > >> with INFRA.
> > >> ---
> > >>
> > >
> > >
> >
>


Re: [GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-08-04 Thread Parth Chandra
Hi Subbu,

  Yes we can discuss this on the next hangout. If Jason is able to attend
we can discuss some way to address his concern.

Parth

On Wed, Aug 3, 2016 at 10:24 AM, Subbu Srinivasan 
wrote:

> Hi Folks,
> When can we discuss this feature? Would next hangout be appropriate?
>
> Thanks
> Subbu
>
> On Mon, Jul 25, 2016 at 10:20 AM, Subbu Srinivasan <
> ssriniva...@zscaler.com>
> wrote:
>
> > This mechanism falls in line with other JSON processing similar to
> serde's
> > with Hive, UDF's enabled at global level will apply to all users and is
> > outlined using documentation.
> >
> >
> > What is your stance if we move to the JSONFormatPlugin?
> >
> > On Fri, Jul 15, 2016 at 2:08 PM, jaltekruse  wrote:
> >
> >> Github user jaltekruse commented on the issue:
> >>
> >> https://github.com/apache/drill/pull/518
> >>
> >> I don't think we should merge this without a mechanism to return a
> >> warning to the user to tell them at least that some data was ignored,
> and
> >> ideally some indication of how much data was discarded. While I do
> >> understand this is not the default behavior, I think there is still too
> >> high of a risk that an admin could set this at a global level and users
> >> would be unaware of some of their data being discarded.
> >>
> >> I am willing to discuss the benefits of merging this before such a
> >> system exists, but until this issue has been thoroughly evaluated I am
> -1
> >> on the change.
> >>
> >> One improvement you could make to the current implementation is
> >> moving the option to the format plugin instead of the system/session
> list.
> >> This enables users to include setting the option in there query with the
> >> "table with options" syntax that was added last fall. We already have a
> >> JIRA open for moving the all_text_mode and read_numbers_as_double
> options
> >> to this location, because it doesn't really make sense to change query
> >> results based on session state. Unfortunately this change does not
> >> completely remove my initial concern, because not all users can modify
> or
> >> see the storage plugins in the case when web UI security is enabled.
> >> Non-admin users in these cases could be surprised by this behavior.
> >>
> >> For examples of how this is done, you can look at the text plugin
> >> config, you would just need to add these options as properties to the
> json
> >> config which is currently mostly empty.
> >>
> >> https://github.com/apache/drill/blob/master/exec/java-
> exec/src/main/java/org/apache/drill/exec/store/easy/json/
> JSONFormatPlugin.java#L93
> >>
> >>
> >> https://github.com/apache/drill/blob/master/exec/java-
> exec/src/main/java/org/apache/drill/exec/store/easy/text/
> TextFormatPlugin.java#L135
> >>
> >> Select with options: https://issues.apache.org/
> jira/browse/DRILL-4047
> >> Jira for moving the existing options:
> >> https://issues.apache.org/jira/browse/DRILL-4206
> >>
> >>
> >> ---
> >> If your project is set up for it, you can reply to this email and have
> >> your
> >> reply appear on GitHub as well. If your project does not have this
> feature
> >> enabled and wishes so, or if the feature is enabled but not working,
> >> please
> >> contact infrastructure at infrastruct...@apache.org or file a JIRA
> ticket
> >> with INFRA.
> >> ---
> >>
> >
> >
>


Re: [GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-08-03 Thread Subbu Srinivasan
Hi Folks,
When can we discuss this feature? Would next hangout be appropriate?

Thanks
Subbu

On Mon, Jul 25, 2016 at 10:20 AM, Subbu Srinivasan 
wrote:

> This mechanism falls in line with other JSON processing similar to serde's
> with Hive, UDF's enabled at global level will apply to all users and is
> outlined using documentation.
>
>
> What is your stance if we move to the JSONFormatPlugin?
>
> On Fri, Jul 15, 2016 at 2:08 PM, jaltekruse  wrote:
>
>> Github user jaltekruse commented on the issue:
>>
>> https://github.com/apache/drill/pull/518
>>
>> I don't think we should merge this without a mechanism to return a
>> warning to the user to tell them at least that some data was ignored, and
>> ideally some indication of how much data was discarded. While I do
>> understand this is not the default behavior, I think there is still too
>> high of a risk that an admin could set this at a global level and users
>> would be unaware of some of their data being discarded.
>>
>> I am willing to discuss the benefits of merging this before such a
>> system exists, but until this issue has been thoroughly evaluated I am -1
>> on the change.
>>
>> One improvement you could make to the current implementation is
>> moving the option to the format plugin instead of the system/session list.
>> This enables users to include setting the option in there query with the
>> "table with options" syntax that was added last fall. We already have a
>> JIRA open for moving the all_text_mode and read_numbers_as_double options
>> to this location, because it doesn't really make sense to change query
>> results based on session state. Unfortunately this change does not
>> completely remove my initial concern, because not all users can modify or
>> see the storage plugins in the case when web UI security is enabled.
>> Non-admin users in these cases could be surprised by this behavior.
>>
>> For examples of how this is done, you can look at the text plugin
>> config, you would just need to add these options as properties to the json
>> config which is currently mostly empty.
>>
>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONFormatPlugin.java#L93
>>
>>
>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L135
>>
>> Select with options: https://issues.apache.org/jira/browse/DRILL-4047
>> Jira for moving the existing options:
>> https://issues.apache.org/jira/browse/DRILL-4206
>>
>>
>> ---
>> If your project is set up for it, you can reply to this email and have
>> your
>> reply appear on GitHub as well. If your project does not have this feature
>> enabled and wishes so, or if the feature is enabled but not working,
>> please
>> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
>> with INFRA.
>> ---
>>
>
>


Re: [GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-07-25 Thread Subbu Srinivasan
This mechanism falls in line with other JSON processing similar to serde's
with Hive, UDF's enabled at global level will apply to all users and is
outlined using documentation.


What is your stance if we move to the JSONFormatPlugin?

On Fri, Jul 15, 2016 at 2:08 PM, jaltekruse  wrote:

> Github user jaltekruse commented on the issue:
>
> https://github.com/apache/drill/pull/518
>
> I don't think we should merge this without a mechanism to return a
> warning to the user to tell them at least that some data was ignored, and
> ideally some indication of how much data was discarded. While I do
> understand this is not the default behavior, I think there is still too
> high of a risk that an admin could set this at a global level and users
> would be unaware of some of their data being discarded.
>
> I am willing to discuss the benefits of merging this before such a
> system exists, but until this issue has been thoroughly evaluated I am -1
> on the change.
>
> One improvement you could make to the current implementation is moving
> the option to the format plugin instead of the system/session list. This
> enables users to include setting the option in there query with the "table
> with options" syntax that was added last fall. We already have a JIRA open
> for moving the all_text_mode and read_numbers_as_double options to this
> location, because it doesn't really make sense to change query results
> based on session state. Unfortunately this change does not completely
> remove my initial concern, because not all users can modify or see the
> storage plugins in the case when web UI security is enabled. Non-admin
> users in these cases could be surprised by this behavior.
>
> For examples of how this is done, you can look at the text plugin
> config, you would just need to add these options as properties to the json
> config which is currently mostly empty.
>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONFormatPlugin.java#L93
>
>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L135
>
> Select with options: https://issues.apache.org/jira/browse/DRILL-4047
> Jira for moving the existing options:
> https://issues.apache.org/jira/browse/DRILL-4206
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-07-15 Thread jaltekruse
Github user jaltekruse commented on the issue:

https://github.com/apache/drill/pull/518
  
I don't think we should merge this without a mechanism to return a warning 
to the user to tell them at least that some data was ignored, and ideally some 
indication of how much data was discarded. While I do understand this is not 
the default behavior, I think there is still too high of a risk that an admin 
could set this at a global level and users would be unaware of some of their 
data being discarded.

I am willing to discuss the benefits of merging this before such a system 
exists, but until this issue has been thoroughly evaluated I am -1 on the 
change.

One improvement you could make to the current implementation is moving the 
option to the format plugin instead of the system/session list. This enables 
users to include setting the option in there query with the "table with 
options" syntax that was added last fall. We already have a JIRA open for 
moving the all_text_mode and read_numbers_as_double options to this location, 
because it doesn't really make sense to change query results based on session 
state. Unfortunately this change does not completely remove my initial concern, 
because not all users can modify or see the storage plugins in the case when 
web UI security is enabled. Non-admin users in these cases could be surprised 
by this behavior.

For examples of how this is done, you can look at the text plugin config, 
you would just need to add these options as properties to the json config which 
is currently mostly empty.

https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONFormatPlugin.java#L93


https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L135

Select with options: https://issues.apache.org/jira/browse/DRILL-4047
Jira for moving the existing options: 
https://issues.apache.org/jira/browse/DRILL-4206


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-07-14 Thread chunhui-shi
Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/518
  
Thanks for providing a way to skip bad json records and addressing my 
comments.
Besides the minor improvements I suggested above: comment format changes 
that may be due to line width in your env.,  and unit tests that should check 
the expected output as much as possible, could you please squash these multiple 
commits into one commit? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-07-11 Thread jinfengni
Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/518
  
@chunhui-shi , I saw you made comments days ago. Can you pls take a look at 
the new patch to see if it addressed your comment? thx. 
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-24 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
I have made several modifications to get accurate line nos:

- The main change is to move the json parser to the end of the current 
record being processed since
previously multiple exceptions were thrown.

- BaseJsonProcessor contains new method protected 
JsonExceptionProcessingState processJSONException().
-This is called by both CountingJsonReader and JsonProcessor whenever 
parser encounters a jackson parsing exception.
-I have also added a new system setting 
store.json.reader.print_skipped_invalid_record_number so that we can suppress 
printing of line numbers.
- Added more unit test cases.
- System tested with various combinations on a local drill bit.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-22 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
What is the recommendation with this bug? Should we just remove the line 
nos for now and just name the file name. I was not able to work on this last 
couple of days, not sure if there is sufficient time until the next candidate 
release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-17 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/518
  
hmm.. I still see 4 commits in the pull request.  Can you squash them into 
one ? (let me know if you need help with that).  Also, the commit message needs 
to be in the format:  "DRILL-4653: Malformed json"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-16 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/518
  
Looks much better.  Sorry for the nitpick but I still have a couple more 
related to the coding conventions. :)Also, could you squash the commits 
into 1 and use the DRILL-:   format for the commit ?   
thanks !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-16 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
I have made changes are recommended by reviewers:

- Changed JSON_READER_SKIP_INVALID_RECORDS_FLAG constant
- Modified unit test to use builder framework
- Code indendation changes.  



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
I updated the pull request with more changes requested by the reviewers.

On Wed, Jun 15, 2016 at 5:28 PM, Aman Sinha 
wrote:

> Yes, it does in fact have a conflict with DRILL-3764 which has changes to
> the JsonRecordReader, although this issue is still in progress. I noticed
> that @adeneche  mentioned this in the JIRA.
> @ssriniva123  did you get a chance to
> look at DRILL-3764 ?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
For some reason the build terminated- What needs to be done to fire it 
again?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
I have added support so that it prints the offending file/line nos.

Error parsing JSON in DRILL-4653.json : line nos :13
Error parsing JSON in DRILL-4653.json : line nos :14


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
Yes, I looked at DRILL-3764. The changes made to JSONProcessing do not 
conflict with my changes.
The only conflict I see is these two constants, I have defined my own.

 String ENABLE_SKIP_INVALID_RECORD_KEY = "exec.enable_skip_invalid_record";
 +  BooleanValidator ENABLE_SKIP_INVALID_RECORD = new 
BooleanValidator(ENABLE_SKIP_INVALID_RECORD_KEY, false);




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/518
  
Yes, it does in fact have a conflict with DRILL-3764 which has changes to 
the JsonRecordReader, although this issue is still in progress.   I noticed 
that @adeneche  mentioned this in the JIRA. @ssriniva123 did you get a chance 
to look at DRILL-3764 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/518
  
Does this conflict with Drill 3764 (see 
https://github.com/apache/drill/pull/497)?. I think there was a broader 
discussion around this as part of enabling the functionality of 
skipping/logging erroneous records. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
Do u anticipate this to be logged using log4j?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/518
  
A high level comment I have about this PR is that keeping the number of 
'malformed' records in the json file may not be sufficient..I would expect the 
user would want to know the file name and the location within the file for such 
record.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---