[ https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023986#comment-16023986 ]
ASF GitHub Bot commented on DRILL-5432: --------------------------------------- Github user tdunning commented on a diff in the pull request: https://github.com/apache/drill/pull/831#discussion_r118399563 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/pcap/TestPcapDecoder.java --- @@ -0,0 +1,230 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * <p> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.pcap; + +import com.google.common.io.Resources; +import org.apache.drill.BaseTestQuery; +import org.apache.drill.exec.store.pcap.decoder.Packet; +import org.apache.drill.exec.store.pcap.decoder.PacketDecoder; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.io.BufferedInputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; +import java.io.InputStream; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +public class TestPcapDecoder extends BaseTestQuery { + private static File bigFile; + + /** + * Creates an ephemeral file of about a GB in size + * + * @throws IOException If input file can't be read or output can't be written. + */ + @BeforeClass + public static void buildBigTcpFile() throws IOException { + bigFile = File.createTempFile("tcp", ".pcap"); + bigFile.deleteOnExit(); + boolean first = true; + System.out.printf("Building large test file\n"); + try (DataOutputStream out = new DataOutputStream(new FileOutputStream(bigFile))) { + for (int i = 0; i < 1000e6 / (29208 - 24) + 1; i++) { + // might be faster to keep this open and rewind each time, but + // that is hard to do with a resource, especially if it comes + // from the class path instead of files. + try (InputStream in = Resources.getResource("store/pcap/tcp-2.pcap").openStream()) { + ConcatPcap.copy(first, in, out); + } + first = false; + } + System.out.printf("Created file is %.1f MB\n", bigFile.length() / 1e6); --- End diff -- I changed those methods to be called from a public static void main(). That allows them to be used to get information about speeds, but doesn't include their output in the test. I think that addresses this comment. > Want a memory format for PCAP files > ----------------------------------- > > Key: DRILL-5432 > URL: https://issues.apache.org/jira/browse/DRILL-5432 > Project: Apache Drill > Issue Type: New Feature > Reporter: Ted Dunning > > PCAP files [1] are the de facto standard for storing network capture data. In > security and protocol applications, it is very common to want to extract > particular packets from a capture for further analysis. > At a first level, it is desirable to query and filter by source and > destination IP and port or by protocol. Beyond that, however, it would be > very useful to be able to group packets by TCP session and eventually to look > at packet contents. For now, however, the most critical requirement is that > we should be able to scan captures at very high speed. > I previously wrote a (kind of working) proof of concept for a PCAP decoder > that did lazy deserialization and could traverse hundreds of MB of PCAP data > per second per core. This compares to roughly 2-3 MB/s for widely available > Apache-compatible open source PCAP decoders. > This JIRA covers the integration and extension of that proof of concept as a > Drill file format. > Initial work is available at https://github.com/mapr-demos/drill-pcap-format > [1] https://en.wikipedia.org/wiki/Pcap -- This message was sent by Atlassian JIRA (v6.3.15#6346)