Re: ***UNCHECKED*** Re: Query Error on PCAP over MapR FS

2017-09-11 Thread Ted Dunning
This stack trace makes it clear that this is a bug in the PCAP decoder
caused by a misunderstanding of how to force large files to be read in one
batch on a single drillBit.

Are there some real Drill experts out there who can provide hints about how
to avoid this?



On Tue, Sep 12, 2017 at 5:03 AM, Takeo Ogawara 
wrote:

> Sorry
>
> I paste plain texts.
>
> > 2017-09-11 15:06:52,390 [BitServer-2] WARN  
> > o.a.d.exec.rpc.control.WorkEventBus
> - A fragment message arrived but there was no registered listener for that
> message: profile {
> >   state: FAILED
> >   error {
> > error_id: "bbf284b6-9da4-4869-ac20-fa100eed11b9"
> > endpoint {
> >   address: "node22"
> >   user_port: 31010
> >   control_port: 31011
> >   data_port: 31012
> >   version: "1.11.0"
> > }
> > error_type: SYSTEM
> > message: "SYSTEM ERROR: IllegalStateException: Bad magic number =
> 0a0d0d0a\n\nFragment 1:200\n\n[Error Id: bbf284b6-9da4-4869-ac20-fa100eed11b9
> on node22:31010]"
> > exception {
> >   exception_class: "java.lang.IllegalStateException"
> >   message: "Bad magic number = 0a0d0d0a"
> >   stack_trace {
> > class_name: "com.google.common.base.Preconditions"
> > file_name: "Preconditions.java"
> > line_number: 173
> > method_name: "checkState"
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.store.
> pcap.decoder.PacketDecoder"
> > file_name: "PacketDecoder.java"
> > line_number: 84
> > method_name: ""
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.store.pcap.PcapRecordReader"
> > file_name: "PcapRecordReader.java"
> > line_number: 104
> > method_name: "setup"
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.physical.impl.ScanBatch"
> > file_name: "ScanBatch.java"
> > line_number: 104
> > method_name: ""
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.store.
> dfs.easy.EasyFormatPlugin"
> > file_name: "EasyFormatPlugin.java"
> > line_number: 166
> > method_name: "getReaderBatch"
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.store.dfs.easy.
> EasyReaderBatchCreator"
> > file_name: "EasyReaderBatchCreator.java"
> > line_number: 35
> > method_name: "getBatch"
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.store.dfs.easy.
> EasyReaderBatchCreator"
> > file_name: "EasyReaderBatchCreator.java"
> > line_number: 28
> > method_name: "getBatch"
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> > file_name: "ImplCreator.java"
> > line_number: 156
> > method_name: "getRecordBatch"
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> > file_name: "ImplCreator.java"
> > line_number: 179
> > method_name: "getChildren"
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> > file_name: "ImplCreator.java"
> > line_number: 136
> > method_name: "getRecordBatch"
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> > file_name: "ImplCreator.java"
> > line_number: 179
> > method_name: "getChildren"
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> > file_name: "ImplCreator.java"
> > line_number: 136
> > method_name: "getRecordBatch"
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> > file_name: "ImplCreator.java"
> > line_number: 179
> > method_name: "getChildren"
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> > file_name: "ImplCreator.java"
> > line_number: 109
> > method_name: "getRootExec"
> > is_native_method: false
> >   }
> >   stack_trace {
> > class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> > file_name: "ImplCreator.java"
> > line_number: 87
> > method_name: 

Re: ***UNCHECKED*** Re: Query Error on PCAP over MapR FS

2017-09-11 Thread Ted Dunning
On Tue, Sep 12, 2017 at 4:53 AM, Takeo Ogawara 
wrote:

>
>
> > Is it absolutely required to query large files like this? Would it be
> > acceptable to split the file first by making a quick scan over it?
> No,loading large file isn’t necessarily required.
> In fact, this large PCAP file is created by concatenating small PCAP files
> with mergecap command.
> So there is no problem with input small PCAP files into Drill.
>
> How can I analyze numbers of PCAP files together?
>

Simply specify a directory instead of a file. If the directory contains
PCAP files, then you will query those files as if they are one table.

You can also specify wildcard to allow you to query just some files.


Re: ***UNCHECKED*** Re: Query Error on PCAP over MapR FS

2017-09-11 Thread Takeo Ogawara
Sorry

I paste plain texts.

> 2017-09-11 15:06:52,390 [BitServer-2] WARN  
> o.a.d.exec.rpc.control.WorkEventBus - A fragment message arrived but there 
> was no registered listener for that message: profile {
>   state: FAILED
>   error {
> error_id: "bbf284b6-9da4-4869-ac20-fa100eed11b9"
> endpoint {
>   address: "node22"
>   user_port: 31010
>   control_port: 31011
>   data_port: 31012
>   version: "1.11.0"
> }
> error_type: SYSTEM
> message: "SYSTEM ERROR: IllegalStateException: Bad magic number = 
> 0a0d0d0a\n\nFragment 1:200\n\n[Error Id: bbf284b6-9da4-4869-ac20-fa100eed11b9 
> on node22:31010]"
> exception {
>   exception_class: "java.lang.IllegalStateException"
>   message: "Bad magic number = 0a0d0d0a"
>   stack_trace {
> class_name: "com.google.common.base.Preconditions"
> file_name: "Preconditions.java"
> line_number: 173
> method_name: "checkState"
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.exec.store.pcap.decoder.PacketDecoder"
> file_name: "PacketDecoder.java"
> line_number: 84
> method_name: ""
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.exec.store.pcap.PcapRecordReader"
> file_name: "PcapRecordReader.java"
> line_number: 104
> method_name: "setup"
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.exec.physical.impl.ScanBatch"
> file_name: "ScanBatch.java"
> line_number: 104
> method_name: ""
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin"
> file_name: "EasyFormatPlugin.java"
> line_number: 166
> method_name: "getReaderBatch"
> is_native_method: false
>   }
>   stack_trace {
> class_name: 
> "org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator"
> file_name: "EasyReaderBatchCreator.java"
> line_number: 35
> method_name: "getBatch"
> is_native_method: false
>   }
>   stack_trace {
> class_name: 
> "org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator"
> file_name: "EasyReaderBatchCreator.java"
> line_number: 28
> method_name: "getBatch"
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> file_name: "ImplCreator.java"
> line_number: 156
> method_name: "getRecordBatch"
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> file_name: "ImplCreator.java"
> line_number: 179
> method_name: "getChildren"
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> file_name: "ImplCreator.java"
> line_number: 136
> method_name: "getRecordBatch"
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> file_name: "ImplCreator.java"
> line_number: 179
> method_name: "getChildren"
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> file_name: "ImplCreator.java"
> line_number: 136
> method_name: "getRecordBatch"
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> file_name: "ImplCreator.java"
> line_number: 179
> method_name: "getChildren"
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> file_name: "ImplCreator.java"
> line_number: 109
> method_name: "getRootExec"
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.exec.physical.impl.ImplCreator"
> file_name: "ImplCreator.java"
> line_number: 87
> method_name: "getExec"
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.exec.work.fragment.FragmentExecutor"
> file_name: "FragmentExecutor.java"
> line_number: 207
> method_name: "run"
> is_native_method: false
>   }
>   stack_trace {
> class_name: "org.apache.drill.common.SelfCleaningRunnable"
> file_name: "SelfCleaningRunnable.java"
> line_number: 38
> method_name: "run"
> is_native_method: false
>   }
>   stack_trace {
> class_name: "..."
> line_number: 0
> 

Re: Query Error on PCAP over MapR FS

2017-09-11 Thread Ted Dunning
On Mon, Sep 11, 2017 at 11:23 AM, Takeo Ogawara  wrote:

> ...
>
> 1. Query error when cluster-name is not specified
> ...
>
> With this setting, the following query failed.
> > select * from mfs.`x.pcap` ;
> > Error: DATA_READ ERROR: /x.pcap (No such file or directory)
> >
> > File name: /x.pcap
> > Fragment 0:0
> >
> > [Error Id: 70b73062-c3ed-4a10-9a88-034b4e6d039a on node21:31010]
> (state=,code=0)
>
> But these queries passed.
> > select * from mfs.root.`x.pcap` ;
> > select * from mfs.`x.csv`;
> > select * from mfs.root.`x.csv`;
>

As Andries mentioned, the problem here has to do with understanding what
Drill is thinking about how paths are manipulated. Nothing to do with the
PCAP capabilities.

Usually, what I do is put entries into the configuration which directly
point to the directory above my data, but I can't add anything Andries
comment.


> 2. Large PCAP file
> Query on very large PCAP file (larger than 100GB) failed with following
> error message.
> > Error: SYSTEM ERROR: IllegalStateException: Bad magic number = 0a0d0d0a
> >
> > Fragment 1:169
> >
> > [Error Id: 8882c359-c253-40c0-866c-417ef1ce5aa3 on node22:31010]
> (state=,code=0)
>
> This happens even on Linux FS not MapR FS
>

Can you provide the stack trace from the Drillbit that hit the problem?

I suspect that this has to do with splitting of the PCAP file. Normally, it
is assumed that parallelism will be achieved by having lots of smaller
files since it is difficult to jump into the middle of a PCAP file and get
good results.

Even if we disable splitting to avoid this error, you will have the
complementary problem of slow queries due to single-threading. That doesn't
seem very satisfactory either.

A similar problem is that splitting a PCAP file pretty much requires a
single-threaded read of the file in question. The read doesn't need to
process very much data, but it does need to touch the whole file.

Is it absolutely required to query large files like this? Would it be
acceptable to split the file first by making a quick scan over it?


Re: Querying MapR-DB JSON Tables not returning results when specifying columns or CF's

2017-09-11 Thread Padma Penumarthy
Do you think this is a regression ?  Can you try with Drill 1.11 ?

Thanks,
Padma


> On Sep 11, 2017, at 10:21 AM, Andries Engelbrecht  
> wrote:
> 
> Created a MapR-DB JSON table, but not able to query data specifying column or 
> CF’s.
> 
> When doing a select * the data is returned.
> 
> i.e.
> 
> 0: jdbc:drill:> select * from dfs.maprdb.`/sdc/nycbike` b limit 1;
> ++---+---++-+-+-+---++---+-+---+-+--+-+-+--+--+---++-+---+
> |_id |  age  |  arc  | avg_speed_mph  | bikeid  | 
> birth year  | end station id  | end station latitude  | end station longitude 
>  | end station name  | gender  | start station id  | start station latitude  
> | start station longitude  | start station name  | start_date  |  
> starttime   |   stoptime   | tripduration  |   tripid 
>   |  usertype   |  station  |
> ++---+---++-+-+-+---++---+-+---+-+--+-+-+--+--+---++-+---+
> | 2017-04-01 00:00:58-25454  | 51.0  | 0.39  | 7.2| 25454   | 
> 1966.0  | 430 | 40.7014851| -73.98656928  
>  | York St & Jay St  | M   | 217   | 40.70277159 
> | -73.99383605 | Old Fulton St   | 2017-04-01  | 2017-04-01 
> 00:00:58  | 2017-04-01 00:04:14  | 195   | 2017-04-01 00:00:58-25454  
> | Subscriber  | {"end station id":"430"}  |
> ++---+---++-+-+-+---++---+-+---+-+--+-+-+--+--+---++-+---+
> 1 row selected (0.191 seconds)
> 
> 
> However trying to specify a column or CF name nothing is returned.
> 
> Specify a column name
> 
> 0: jdbc:drill:> select bikeid from dfs.maprdb.`/sdc/nycbike` b limit 10;
> +--+
> |  |
> +--+
> +--+
> No rows selected (0.067 seconds)
> 
> 0: jdbc:drill:> select b.bikeid from dfs.maprdb.`/sdc/nycbike` b limit 1;
> +--+
> |  |
> +--+
> +--+
> No rows selected (0.062 seconds)
> 
> 
> Specify a CF name the same result.
> 
> 0: jdbc:drill:> select b.station from dfs.maprdb.`/sdc/nycbike` b limit 1;
> +--+
> |  |
> +--+
> +--+
> No rows selected (0.063 seconds)
> 
> 
> Drill 1.10 and the user has full read/write/traverse permissions on the table.
> 
> 
> 
> 
> Thanks
> 
> Andries



Re: Workaround for drill queries during node failure

2017-09-11 Thread Padma Penumarthy
Did you mean to say “we could not execute any queries” ?

Need more details about configuration you have.
When you say data is available on other nodes, is it because you
have replication configured (assuming it is DFS) ?

What exactly are you trying and what error you see when you try to
execute the query ?

Thanks,
Padma


On Sep 11, 2017, at 9:40 AM, Kshitija Shinde 
> wrote:

Hi,

We have installed drill in distributed mode. While testing drillbit we have
observed that if one of node is done then we could execute any queries
against the drill even if data is available on other nodes.



Is there any workaround for this?



Thanks,

Kshitija



Querying MapR-DB JSON Tables not returning results when specifying columns or CF's

2017-09-11 Thread Andries Engelbrecht
Created a MapR-DB JSON table, but not able to query data specifying column or 
CF’s.

When doing a select * the data is returned.

i.e.

0: jdbc:drill:> select * from dfs.maprdb.`/sdc/nycbike` b limit 1;
++---+---++-+-+-+---++---+-+---+-+--+-+-+--+--+---++-+---+
|_id |  age  |  arc  | avg_speed_mph  | bikeid  | birth 
year  | end station id  | end station latitude  | end station longitude  | end 
station name  | gender  | start station id  | start station latitude  | start 
station longitude  | start station name  | start_date  |  starttime   | 
  stoptime   | tripduration  |   tripid   |  usertype   
|  station  |
++---+---++-+-+-+---++---+-+---+-+--+-+-+--+--+---++-+---+
| 2017-04-01 00:00:58-25454  | 51.0  | 0.39  | 7.2| 25454   | 
1966.0  | 430 | 40.7014851| -73.98656928   
| York St & Jay St  | M   | 217   | 40.70277159 | 
-73.99383605 | Old Fulton St   | 2017-04-01  | 2017-04-01 
00:00:58  | 2017-04-01 00:04:14  | 195   | 2017-04-01 00:00:58-25454  | 
Subscriber  | {"end station id":"430"}  |
++---+---++-+-+-+---++---+-+---+-+--+-+-+--+--+---++-+---+
1 row selected (0.191 seconds)


However trying to specify a column or CF name nothing is returned.

Specify a column name

0: jdbc:drill:> select bikeid from dfs.maprdb.`/sdc/nycbike` b limit 10;
+--+
|  |
+--+
+--+
No rows selected (0.067 seconds)

0: jdbc:drill:> select b.bikeid from dfs.maprdb.`/sdc/nycbike` b limit 1;
+--+
|  |
+--+
+--+
No rows selected (0.062 seconds)


Specify a CF name the same result.

0: jdbc:drill:> select b.station from dfs.maprdb.`/sdc/nycbike` b limit 1;
+--+
|  |
+--+
+--+
No rows selected (0.063 seconds)


Drill 1.10 and the user has full read/write/traverse permissions on the table.




Thanks

Andries


Workaround for drill queries during node failure

2017-09-11 Thread Kshitija Shinde
Hi,

We have installed drill in distributed mode. While testing drillbit we have
observed that if one of node is done then we could execute any queries
against the drill even if data is available on other nodes.



Is there any workaround for this?



Thanks,

Kshitija


Re: Query Error on PCAP over MapR FS

2017-09-11 Thread Andries Engelbrecht
Typically when you use the MapR-FS plugin you don’t need to specify the cluster 
root path in the dfs workspace.

Instead of "location": "/mapr/cluster3",   use "location": "/",

"connection": "maprfs:///", already points to the default MapR cluster root.

--Andries



On 9/11/17, 2:23 AM, "Takeo Ogawara"  wrote:

Dear all,

I’m using PCAP storage plugin over MapR FS(5.2.0) with Drill(1.11.0) 
compiled as follows.
$ mvn clean install -DskipTests -Pmapr

Some queries caused errors as following.
Does anyone know how to solve these errors?

1. Query error when cluster-name is not specified
Storage “mfs” setting is this.

> "type": "file",
>   "enabled": true,
>   "connection": "maprfs:///",
>   "config": null,
>   "workspaces": {
> "root": {
>   "location": "/mapr/cluster3",
>   "writable": false,
>   "defaultInputFormat": null
> }
>   }


With this setting, the following query failed.
> select * from mfs.`x.pcap` ;
> Error: DATA_READ ERROR: /x.pcap (No such file or directory)
> 
> File name: /x.pcap
> Fragment 0:0
> 
> [Error Id: 70b73062-c3ed-4a10-9a88-034b4e6d039a on node21:31010] 
(state=,code=0)

But these queries passed.
> select * from mfs.root.`x.pcap` ;
> select * from mfs.`x.csv`;
> select * from mfs.root.`x.csv`;

2. Large PCAP file
Query on very large PCAP file (larger than 100GB) failed with following 
error message.
> Error: SYSTEM ERROR: IllegalStateException: Bad magic number = 0a0d0d0a
> 
> Fragment 1:169
> 
> [Error Id: 8882c359-c253-40c0-866c-417ef1ce5aa3 on node22:31010] 
(state=,code=0)

This happens even on Linux FS not MapR FS

Thank you.








Query Error on PCAP over MapR FS

2017-09-11 Thread Takeo Ogawara
Dear all,

I’m using PCAP storage plugin over MapR FS(5.2.0) with Drill(1.11.0) compiled 
as follows.
$ mvn clean install -DskipTests -Pmapr

Some queries caused errors as following.
Does anyone know how to solve these errors?

1. Query error when cluster-name is not specified
Storage “mfs” setting is this.

> "type": "file",
>   "enabled": true,
>   "connection": "maprfs:///",
>   "config": null,
>   "workspaces": {
> "root": {
>   "location": "/mapr/cluster3",
>   "writable": false,
>   "defaultInputFormat": null
> }
>   }


With this setting, the following query failed.
> select * from mfs.`x.pcap` ;
> Error: DATA_READ ERROR: /x.pcap (No such file or directory)
> 
> File name: /x.pcap
> Fragment 0:0
> 
> [Error Id: 70b73062-c3ed-4a10-9a88-034b4e6d039a on node21:31010] 
> (state=,code=0)

But these queries passed.
> select * from mfs.root.`x.pcap` ;
> select * from mfs.`x.csv`;
> select * from mfs.root.`x.csv`;

2. Large PCAP file
Query on very large PCAP file (larger than 100GB) failed with following error 
message.
> Error: SYSTEM ERROR: IllegalStateException: Bad magic number = 0a0d0d0a
> 
> Fragment 1:169
> 
> [Error Id: 8882c359-c253-40c0-866c-417ef1ce5aa3 on node22:31010] 
> (state=,code=0)

This happens even on Linux FS not MapR FS

Thank you.