[GitHub] gaodayue commented on issue #6036: use S3 as a backup storage for hdfs deep storage

2018-07-23 Thread GitBox
gaodayue commented on issue #6036: use S3 as a backup storage for hdfs deep 
storage
URL: https://github.com/apache/incubator-druid/pull/6036#issuecomment-407288499
 
 
   Hi @jihoonson , thanks for your comments. Answering your question
   
   > I think, if this is the case, you might need to somehow increase write 
throughput of your HDFS or use a separate deep storage. If the first option is 
not available for you, does it make sense to use only S3 as your deep storage?
   
   Our company operates its own big hadoop cluster (>5k nodes) for us to use. 
Switching to s3-deep-storage requires extra cost and is not an option for us.
   
   > Maybe we need to define the concept of backup deep storage for all deep 
storage types and support it. 
   
   I've thought about implementing something like composite-deep-storage which 
can add backup abilities to all deep storages at first, but found it's 
non-trivial to load multiple deep storage extensions inside 
composite-deep-storage. So I decide to add support hdfs-deep-storage only just 
because we're using it.
   
   > Maybe the primary deep storage and backup deep storage should be in sync 
automatically.
   
   What do you mean by "in sync"? Do you mean all segments pushed to backup 
storage should be copied back to primary storage eventually? If that's the 
case, I don't think there is a strong need for it (explained below). 
   
   > But, this PR is restricted to support it for only HDFS deep storage and 
looks to require another tool, called restore-hdfs-segment, to keep all 
segments to reside in HDFS. This would need additional operations which make 
Druid operation difficult.
   
   First, the restore-hdfs-segment tool is not required to achieve the goal of 
hdfs fault tolerant. I developed it for other reasons. One is to pay less for 
S3 and the other is that we occasionally need to migrate datasource from one 
cluster to another, and we want all segments reside on hdfs so that we can 
simply use the insert-segment-to-db tool to migrate all segments. If other 
users don't have the same concern, they can simply ignore restore-hdfs-segment.
   
   Second, concerning operation complexity, I think it's just a trade-off made 
between availability and cost. And the extra operational cost is as low as run 
restore-hdfs-segment manually after a hdfs failure or set up a daily crontab to 
run restore-hdfs-segment.
   
   > Kafka indexing service guarantees exactly-once data ingestion, and thus 
data loss is never expected to happen. If deep storage is not available, all 
attempts to publish segments would fail and every task should restart from the 
same offset when publishing failed. 
   
   Yeah I'm aware of it. But for other reasons we are still using tranquility 
as the main ingestion tool and hdfs failure do cause data loss several times 
and it's a big pain for us. We've added this feature to solve the problem, and 
I think maybe it's also useful for other people.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



[GitHub] gianm commented on issue #6037: snapshot quickstart error

2018-07-23 Thread GitBox
gianm commented on issue #6037: snapshot quickstart error
URL: 
https://github.com/apache/incubator-druid/issues/6037#issuecomment-407269663
 
 
   I have seen this happen with older versions of javac; perhaps try upgrading 
that (looks like your jdk is 1.8.0_60).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: Release process for Maven artifacts?

2018-07-23 Thread Gian Merlino
Hi Joseph,

For official releases we just do mvn release:prepare release:perform. The
poms encode everything that has to happen, including sources jars, javadocs
jars, signing everything, and pushing it up to Maven Central. You might be
able to modify them to push to a different repo.

On Mon, Jul 23, 2018 at 5:06 PM Joseph Glanville  wrote:

> Hi,
>
> Is the release process for publishing Maven artifacts documented somewhere?
> We have been building tar archives with `mvn package` successfully but we
> would like to publish our own Maven artifacts also, including `-sources`
> JARs so we can add them as dependencies to projects outside of the Druid
> source tree.
>
> Joseph.
>


[GitHub] AlexanderSaydakov opened a new issue #6037: snapshot quickstart error

2018-07-23 Thread GitBox
AlexanderSaydakov opened a new issue #6037: snapshot quickstart error
URL: https://github.com/apache/incubator-druid/issues/6037
 
 
   I am trying to follow the quick start guide using the snapshot from the 
master.
   http://druid.io/docs/latest/tutorials/quickstart.html
   All services appear to start, wiki data indexing succeeds, but never becomes 
fully available (red dot in the console view).
   By looking at the logs, the following error in the 
log/server-coordinator.log seems to be relevant:
   `
   2018-07-24T00:47:29,773 ERROR 
[LeaderSelector[/druid/coordinator/_COORDINATOR]] 
org.apache.curator.framework.listen.ListenerContainer - Listener 
(io.druid.curator.discovery.CuratorDruidLeaderSelector$1@7e15f4d4) threw an 
exception
   java.lang.VerifyError: Bad type on operand stack
   Exception Details:
 Location:
   
io/druid/server/coordinator/DruidCoordinator$CoordinatorHistoricalManagerRunnable.(Lio/druid/server/coordinator/DruidCoordinator;I)V
 @16: invokedynamic
 Reason:
   Type uninitializedThis (current frame, stack[3]) is not assignable to 
'io/druid/server/coordinator/DruidCoordinator$CoordinatorHistoricalManagerRunnable'
 Current Frame:
   bci: @16
   flags: { flagThisUninit }
   locals: { uninitializedThis, 
'io/druid/server/coordinator/DruidCoordinator', integer }
   stack: { uninitializedThis, 
'io/druid/server/coordinator/DruidCoordinator', 
'io/druid/server/coordinator/helper/DruidCoordinatorSegmentInfoLoader', 
uninitializedThis }
 Bytecode:
   0x000: 2a2b b500 012a 2bbb 0002 592b b700 032a
   0x010: ba00 0400 00bb 0005 592b b700 06bb 0007
   0x020: 592b b700 08bb 0009 592b b700 0abb 000b
   0x030: 592b b700 0cbb 000d 592b b700 0eb8 000f
   0x040: 1cb7 0010 b1   
   
   at 
io.druid.server.coordinator.DruidCoordinator.becomeLeader(DruidCoordinator.java:540)
 ~[druid-server-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
   at 
io.druid.server.coordinator.DruidCoordinator.access$000(DruidCoordinator.java:96)
 ~[druid-server-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
   at 
io.druid.server.coordinator.DruidCoordinator$1.becomeLeader(DruidCoordinator.java:495)
 ~[druid-server-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
   at 
io.druid.curator.discovery.CuratorDruidLeaderSelector$1.isLeader(CuratorDruidLeaderSelector.java:98)
 ~[druid-server-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
   at 
org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:665)
 ~[curator-recipes-4.0.0.jar:4.0.0]
   at 
org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:661)
 ~[curator-recipes-4.0.0.jar:4.0.0]
   at 
org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93)
 [curator-framework-4.0.0.jar:4.0.0]
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[?:1.8.0_60]
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[?:1.8.0_60]
   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]
   `


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Release process for Maven artifacts?

2018-07-23 Thread Joseph Glanville
Hi,

Is the release process for publishing Maven artifacts documented somewhere?
We have been building tar archives with `mvn package` successfully but we
would like to publish our own Maven artifacts also, including `-sources`
JARs so we can add them as dependencies to projects outside of the Druid
source tree.

Joseph.


[GitHub] jihoonson commented on issue #6028: Error in SqlMetadataRuleManagerTest

2018-07-23 Thread GitBox
jihoonson commented on issue #6028: Error in SqlMetadataRuleManagerTest
URL: 
https://github.com/apache/incubator-druid/issues/6028#issuecomment-407166896
 
 
   Thanks @leventov. Probably `SQLMetadataSegmentManager` has the same bug. 
I'll check it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



[GitHub] jihoonson commented on a change in pull request #6033: Synchronize scheduled poll() calls in SQLMetadataRuleManager to prevent flakiness in SqlMetadataRuleManagerTest

2018-07-23 Thread GitBox
jihoonson commented on a change in pull request #6033: Synchronize scheduled 
poll() calls in SQLMetadataRuleManager to prevent flakiness in 
SqlMetadataRuleManagerTest
URL: https://github.com/apache/incubator-druid/pull/6033#discussion_r204518835
 
 

 ##
 File path: server/src/main/java/io/druid/metadata/SQLMetadataRuleManager.java
 ##
 @@ -142,13 +139,19 @@ public Void withHandle(Handle handle) throws Exception
   private final AuditManager auditManager;
 
   private final Object lock = new Object();
-
-  private volatile boolean started = false;
-
-  private volatile ListeningScheduledExecutorService exec = null;
-  private volatile ListenableFuture future = null;
-
-  private volatile long retryStartTime = 0;
+  /** The number of times this SQLMetadataRuleManager was started. */
+  private long startCount = 0;
+  /**
+   * Equal to the current {@link #startCount} value, if the 
SQLMetadataRuleManager is currently started; -1 if
+   * currently stopped.
+   *
+   * This field is used to implement a simple stamp mechanism instead of just 
a boolean "started" flag to prevent
+   * the theoretical situation of two tasks scheduled in {@link #start()} 
calling {@link #poll()} concurrently, if
+   * the sequence of {@link #start()} - {@link #stop()} - {@link #start()} 
actions occurs quickly.
+   */
+  private long currentStartOrder = -1;
+  private ScheduledExecutorService exec = null;
+  private long retryStartTime = 0;
 
 Review comment:
   nit: looks that this can be a local variable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



[GitHub] jihoonson commented on issue #6036: use S3 as a backup storage for hdfs deep storage

2018-07-23 Thread GitBox
jihoonson commented on issue #6036: use S3 as a backup storage for hdfs deep 
storage
URL: https://github.com/apache/incubator-druid/pull/6036#issuecomment-407161433
 
 
   Hi @gaodayue, thanks for the PR. I have a question.
   
   > In many organization, Hadoop and HDFS are typically used in offline data 
analysis, while Druid is targeting online data serving. Thus SLA provided by 
HDFS often can't meet the needs of Druid. 
   
   - I think, if this is the case, you might need to somehow increase write 
throughput of your HDFS or use a separate deep storage. If the first option is 
not available for you, does it make sense to use only S3 as your deep storage?
   
   For the idea of this PR, I'm not sure it is a good idea. Maybe we need to 
define the concept of backup deep storage for all deep storage types and 
support it. Maybe the primary deep storage and backup deep storage should be in 
sync automatically. 
   
   But, this PR is restricted to support it for only HDFS deep storage and 
looks to require another tool, called `restore-hdfs-segment`, to keep all 
segments to reside in HDFS. This would need additional operations which make 
Druid operation difficult. 
   
   Side comment: Kafka indexing service guarantees exactly-once data ingestion, 
and thus data loss is never expected to happen. If deep storage is not 
available, all attempts to publish segments would fail and every task should 
restart from the same offset when publishing failed. This needs reprocessing 
the same data which can make the ingestion slow, but there should be no data 
loss or data duplication. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: Build failure on 0.13.SNAPSHOT

2018-07-23 Thread Jihoon Son
I'm also using Maven 3.5.2 and not using any special configurations for
Maven, but I have never seen that error too.
Most of our Travis jobs have been working with only 512 MB of direct
memory. Only the 'strict compilation' Travis job requires 3 GB of memory.

I think it's worthwhile to look into this more. Maybe we somehow use more
memory when we run all tests by 'mvn install'. Maybe this relates to the
frequent transient failures of 'processing module test', one of our Travis
jobs.

Jihoon

On Mon, Jul 23, 2018 at 9:32 AM Gian Merlino  wrote:

> Interesting. Fwiw, I am using Maven 3.5.2 for building Druid and it has
> been working for for me. I don't think I"m using any special Maven
> overrides (at least, I don't see anything interesting in my ~/.m2 directory
> or in my environment variables). It might have to do with how much memory
> our machines have? I do most of my builds on a Mac with 16GB RAM. Maybe try
> checking .travis.yml in the druid repo. It sets -Xmx3000m for mvn install
> commands, which might be needed for more low memory environments.
>
> $ mvn --version
> Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d;
> 2017-10-18T00:58:13-07:00)
> Maven home: /usr/local/Cellar/maven/3.5.2/libexec
> Java version: 1.8.0_161, vendor: Oracle Corporation
> Java home:
> /Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "mac os x", version: "10.13.5", arch: "x86_64", family: "mac"
>
> On Mon, Jul 23, 2018 at 6:40 AM Dongjin Lee  wrote:
>
> > Finally, it seems like I found the reason. It was a composition of
> several
> > problems:
> >
> > - Druid should not be built with maven 3.5.x. With 3.5.2, Test suites
> like
> > `GroupByQueryRunnerFailureTest` fails. After I switched into 3.3.9 which
> is
> > built in the latest version of IntelliJ, those errors disappeared. It
> seems
> > like maven 3.5.x is not stable yet - it applied a drastic change, and it
> is
> > also why they skipped 3.4.x.
> > - It seems like Druid requires some MaxDirectMemorySize configuration for
> > some test suites. With some JVM parameter like
> `-XX:MaxDirectMemorySize=4g`
> > some test suites were passed, but not all. I am now trying the other
> > options with enlarged swap space.
> >
> > Question: How much MaxDirectMemorySize configuration are you using?
> >
> > Best,
> > Dongjin
> >
> > On Sat, Jul 21, 2018 at 3:01 AM Jihoon Son  wrote:
> >
> > > Hi Dongjin,
> > >
> > > that is weird. It looks like the vm crashed because of out of memory
> > while
> > > testing.
> > > It might be a real issue or not.
> > > Have you set any memory configuration for your maven?
> > >
> > > Jihoon
> > >
> > > On Thu, Jul 19, 2018 at 7:09 PM Dongjin Lee 
> wrote:
> > >
> > > > Hi Jihoon,
> > > >
> > > > I ran `mvn clean package` following development/build
> > > > <
> > > >
> > >
> >
> https://github.com/apache/incubator-druid/blob/master/docs/content/development/build.md
> > > > >
> > > > .
> > > >
> > > > Dongjin
> > > >
> > > > On Fri, Jul 20, 2018 at 12:30 AM Jihoon Son 
> > > wrote:
> > > >
> > > > > Hi Dongjin,
> > > > >
> > > > > what maven command did you run?
> > > > >
> > > > > Jihoon
> > > > >
> > > > > On Wed, Jul 18, 2018 at 10:38 PM Dongjin Lee 
> > > wrote:
> > > > >
> > > > > > Hello. I am trying to build druid, but it fails. My environment
> is
> > > like
> > > > > the
> > > > > > following:
> > > > > >
> > > > > > - CPU: Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz
> > > > > > - RAM: 7704 MB
> > > > > > - OS: ubuntu 18.04
> > > > > > - JDK: openjdk version "1.8.0_171" (default configuration, with
> > > > > MaxHeapSize
> > > > > > = 1928 MB)
> > > > > > - Branch: master (commit: cd8ea3d)
> > > > > >
> > > > > > The error message I got is:
> > > > > >
> > > > > > [INFO]
> > > > > > >
> > > > >
> > >
> 
> > > > > > > [INFO] Reactor Summary:
> > > > > > > [INFO]
> > > > > > > [INFO] io.druid:druid .
> > > SUCCESS [
> > > > > > > 50.258 s]
> > > > > > > [INFO] java-util ..
> > SUCCESS
> > > > > > [03:57
> > > > > > > min]
> > > > > > > [INFO] druid-api ..
> > > SUCCESS [
> > > > > > > 22.694 s]
> > > > > > > [INFO] druid-common ...
> > > SUCCESS [
> > > > > > > 14.083 s]
> > > > > > > [INFO] druid-hll ..
> > > SUCCESS [
> > > > > > > 17.126 s]
> > > > > > > [INFO] extendedset 
> > > SUCCESS [
> > > > > > > 10.856 s]
> > > > > > >
> > > > > > > *[INFO] druid-processing ...
> > > FAILURE
> > > > > > > [04:36 min]*[INFO] druid-aws-common
> > > > ...
> > > > > > > SKIPPED
> > > > > > > [INFO] druid-server ...
> > SKIPPED
> > > > > > > [INFO] druid-examples ...

[GitHub] RestfulBlue commented on issue #5006: No protections for select query

2018-07-23 Thread GitBox
RestfulBlue commented on issue #5006: No protections for select query
URL: 
https://github.com/apache/incubator-druid/issues/5006#issuecomment-407127405
 
 
   Any updates? Druid still can be killed by OOM using simple select query. of 
course I can use restrictions, but not all users understand what they are doing 
and the fact that users can very simply drop the entire cluster does not please 
me. Is it really hard to implement restriction for select query in 
configuration? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: Build failure on 0.13.SNAPSHOT

2018-07-23 Thread Gian Merlino
Interesting. Fwiw, I am using Maven 3.5.2 for building Druid and it has
been working for for me. I don't think I"m using any special Maven
overrides (at least, I don't see anything interesting in my ~/.m2 directory
or in my environment variables). It might have to do with how much memory
our machines have? I do most of my builds on a Mac with 16GB RAM. Maybe try
checking .travis.yml in the druid repo. It sets -Xmx3000m for mvn install
commands, which might be needed for more low memory environments.

$ mvn --version
Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d;
2017-10-18T00:58:13-07:00)
Maven home: /usr/local/Cellar/maven/3.5.2/libexec
Java version: 1.8.0_161, vendor: Oracle Corporation
Java home:
/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.13.5", arch: "x86_64", family: "mac"

On Mon, Jul 23, 2018 at 6:40 AM Dongjin Lee  wrote:

> Finally, it seems like I found the reason. It was a composition of several
> problems:
>
> - Druid should not be built with maven 3.5.x. With 3.5.2, Test suites like
> `GroupByQueryRunnerFailureTest` fails. After I switched into 3.3.9 which is
> built in the latest version of IntelliJ, those errors disappeared. It seems
> like maven 3.5.x is not stable yet - it applied a drastic change, and it is
> also why they skipped 3.4.x.
> - It seems like Druid requires some MaxDirectMemorySize configuration for
> some test suites. With some JVM parameter like `-XX:MaxDirectMemorySize=4g`
> some test suites were passed, but not all. I am now trying the other
> options with enlarged swap space.
>
> Question: How much MaxDirectMemorySize configuration are you using?
>
> Best,
> Dongjin
>
> On Sat, Jul 21, 2018 at 3:01 AM Jihoon Son  wrote:
>
> > Hi Dongjin,
> >
> > that is weird. It looks like the vm crashed because of out of memory
> while
> > testing.
> > It might be a real issue or not.
> > Have you set any memory configuration for your maven?
> >
> > Jihoon
> >
> > On Thu, Jul 19, 2018 at 7:09 PM Dongjin Lee  wrote:
> >
> > > Hi Jihoon,
> > >
> > > I ran `mvn clean package` following development/build
> > > <
> > >
> >
> https://github.com/apache/incubator-druid/blob/master/docs/content/development/build.md
> > > >
> > > .
> > >
> > > Dongjin
> > >
> > > On Fri, Jul 20, 2018 at 12:30 AM Jihoon Son 
> > wrote:
> > >
> > > > Hi Dongjin,
> > > >
> > > > what maven command did you run?
> > > >
> > > > Jihoon
> > > >
> > > > On Wed, Jul 18, 2018 at 10:38 PM Dongjin Lee 
> > wrote:
> > > >
> > > > > Hello. I am trying to build druid, but it fails. My environment is
> > like
> > > > the
> > > > > following:
> > > > >
> > > > > - CPU: Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz
> > > > > - RAM: 7704 MB
> > > > > - OS: ubuntu 18.04
> > > > > - JDK: openjdk version "1.8.0_171" (default configuration, with
> > > > MaxHeapSize
> > > > > = 1928 MB)
> > > > > - Branch: master (commit: cd8ea3d)
> > > > >
> > > > > The error message I got is:
> > > > >
> > > > > [INFO]
> > > > > >
> > > >
> > 
> > > > > > [INFO] Reactor Summary:
> > > > > > [INFO]
> > > > > > [INFO] io.druid:druid .
> > SUCCESS [
> > > > > > 50.258 s]
> > > > > > [INFO] java-util ..
> SUCCESS
> > > > > [03:57
> > > > > > min]
> > > > > > [INFO] druid-api ..
> > SUCCESS [
> > > > > > 22.694 s]
> > > > > > [INFO] druid-common ...
> > SUCCESS [
> > > > > > 14.083 s]
> > > > > > [INFO] druid-hll ..
> > SUCCESS [
> > > > > > 17.126 s]
> > > > > > [INFO] extendedset 
> > SUCCESS [
> > > > > > 10.856 s]
> > > > > >
> > > > > > *[INFO] druid-processing ...
> > FAILURE
> > > > > > [04:36 min]*[INFO] druid-aws-common
> > > ...
> > > > > > SKIPPED
> > > > > > [INFO] druid-server ...
> SKIPPED
> > > > > > [INFO] druid-examples .
> SKIPPED
> > > > > > ...
> > > > > > [INFO]
> > > > > >
> > > >
> > 
> > > > > > [INFO] BUILD FAILURE
> > > > > > [INFO]
> > > > > >
> > > >
> > 
> > > > > > [INFO] Total time: 10:29 min
> > > > > > [INFO] Finished at: 2018-07-19T13:23:31+09:00
> > > > > > [INFO] Final Memory: 88M/777M
> > > > > > [INFO]
> > > > > >
> > > >
> > 
> > > > > >
> > > > > > *[ERROR] Failed to execute goal
> > > > > > org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test
> > > > (default-test)
> > > > > > on project druid-processing: Execution default-test of goal
> > > > > > or

Re: Build failure on 0.13.SNAPSHOT

2018-07-23 Thread Dongjin Lee
Finally, it seems like I found the reason. It was a composition of several
problems:

- Druid should not be built with maven 3.5.x. With 3.5.2, Test suites like
`GroupByQueryRunnerFailureTest` fails. After I switched into 3.3.9 which is
built in the latest version of IntelliJ, those errors disappeared. It seems
like maven 3.5.x is not stable yet - it applied a drastic change, and it is
also why they skipped 3.4.x.
- It seems like Druid requires some MaxDirectMemorySize configuration for
some test suites. With some JVM parameter like `-XX:MaxDirectMemorySize=4g`
some test suites were passed, but not all. I am now trying the other
options with enlarged swap space.

Question: How much MaxDirectMemorySize configuration are you using?

Best,
Dongjin

On Sat, Jul 21, 2018 at 3:01 AM Jihoon Son  wrote:

> Hi Dongjin,
>
> that is weird. It looks like the vm crashed because of out of memory while
> testing.
> It might be a real issue or not.
> Have you set any memory configuration for your maven?
>
> Jihoon
>
> On Thu, Jul 19, 2018 at 7:09 PM Dongjin Lee  wrote:
>
> > Hi Jihoon,
> >
> > I ran `mvn clean package` following development/build
> > <
> >
> https://github.com/apache/incubator-druid/blob/master/docs/content/development/build.md
> > >
> > .
> >
> > Dongjin
> >
> > On Fri, Jul 20, 2018 at 12:30 AM Jihoon Son 
> wrote:
> >
> > > Hi Dongjin,
> > >
> > > what maven command did you run?
> > >
> > > Jihoon
> > >
> > > On Wed, Jul 18, 2018 at 10:38 PM Dongjin Lee 
> wrote:
> > >
> > > > Hello. I am trying to build druid, but it fails. My environment is
> like
> > > the
> > > > following:
> > > >
> > > > - CPU: Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz
> > > > - RAM: 7704 MB
> > > > - OS: ubuntu 18.04
> > > > - JDK: openjdk version "1.8.0_171" (default configuration, with
> > > MaxHeapSize
> > > > = 1928 MB)
> > > > - Branch: master (commit: cd8ea3d)
> > > >
> > > > The error message I got is:
> > > >
> > > > [INFO]
> > > > >
> > >
> 
> > > > > [INFO] Reactor Summary:
> > > > > [INFO]
> > > > > [INFO] io.druid:druid .
> SUCCESS [
> > > > > 50.258 s]
> > > > > [INFO] java-util .. SUCCESS
> > > > [03:57
> > > > > min]
> > > > > [INFO] druid-api ..
> SUCCESS [
> > > > > 22.694 s]
> > > > > [INFO] druid-common ...
> SUCCESS [
> > > > > 14.083 s]
> > > > > [INFO] druid-hll ..
> SUCCESS [
> > > > > 17.126 s]
> > > > > [INFO] extendedset 
> SUCCESS [
> > > > > 10.856 s]
> > > > >
> > > > > *[INFO] druid-processing ...
> FAILURE
> > > > > [04:36 min]*[INFO] druid-aws-common
> > ...
> > > > > SKIPPED
> > > > > [INFO] druid-server ... SKIPPED
> > > > > [INFO] druid-examples . SKIPPED
> > > > > ...
> > > > > [INFO]
> > > > >
> > >
> 
> > > > > [INFO] BUILD FAILURE
> > > > > [INFO]
> > > > >
> > >
> 
> > > > > [INFO] Total time: 10:29 min
> > > > > [INFO] Finished at: 2018-07-19T13:23:31+09:00
> > > > > [INFO] Final Memory: 88M/777M
> > > > > [INFO]
> > > > >
> > >
> 
> > > > >
> > > > > *[ERROR] Failed to execute goal
> > > > > org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test
> > > (default-test)
> > > > > on project druid-processing: Execution default-test of goal
> > > > > org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test failed:
> > The
> > > > > forked VM terminated without properly saying goodbye. VM crash or
> > > > > System.exit called?*[ERROR] Command was /bin/sh -c cd
> > > > > /home/djlee/workspace/java/druid/processing &&
> > > > > /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xmx3000m
> > > > -Duser.language=en
> > > > > -Duser.country=US -Dfile.encoding=UTF-8 -Duser.timezone=UTC
> > > > > -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
> > > > > -Ddruid.indexing.doubleStorage=double -jar
> > > > >
> > > >
> > >
> >
> /home/djlee/workspace/java/druid/processing/target/surefire/surefirebooter1075382243904099051.jar
> > > > >
> > > >
> > >
> >
> /home/djlee/workspace/java/druid/processing/target/surefire/surefire559351134757209tmp
> > > > >
> > > >
> > >
> >
> /home/djlee/workspace/java/druid/processing/target/surefire/surefire_5173894389718744688tmp
> > > >
> > > >
> > > > It seems like it fails when it runs tests on `druid-processing`
> module
> > > but
> > > > I can't certain. Is there anyone who can give me some hints? Thanks
> in
> > > > advance.
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > > --
> > > > *Dongjin Lee*

[GitHub] ashukhira commented on issue #6025: Druid Query Error

2018-07-23 Thread GitBox
ashukhira commented on issue #6025: Druid Query Error
URL: 
https://github.com/apache/incubator-druid/issues/6025#issuecomment-407011699
 
 
   Hi ,
   On this druid forum somebody , provide solution for above druid query error 
. Removing limitSpec Column node query gives proper result.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: synchronization question about datasketches aggregator

2018-07-23 Thread Anastasia Braginsky
 Hi Guys,
Just wanted to pay your attention that once OakIncrementalIndex will be in 
place there is no need to manage the issue of synchronization between 
aggregators and ingestions. Part of Oak benefits is the synchronization for the 
simultaneous writes and reads of the same key in the map.

On Thursday, July 19, 2018, 10:17:18 PM GMT+3, Gian Merlino 
 wrote:  
 
 Hi Will,

Check out also this thread for related discussion:

https://lists.apache.org/thread.html/9899aa790a7eb561ab66f47b35c8f66ffe695432719251351339521a@%3Cdev.druid.apache.org%3E

On Thu, Jul 19, 2018 at 11:21 AM Will Lauer  wrote:

> A colleague recently pointed out to me that all the sketch operations that
> take place in SketchAggregator (in the datasketches module) use a
> SychronizedUnion class that basically wraps a normal sketch Union and
> synchronizes all operations. From what I can tell with other aggregators in
> the Druid code base, there doesn't appear to be a need to synchronize. It
> looks like Aggregators are always processed from within a single thread. Is
> it reasonable to remove all the syncrhonizations from the SketchAggregator
> and avoid the performance hit that they impose at runtime?
>
> Will
>
> Will Lauer
> Senior Principal Architect
>
> Progress requires pain
>
> m: 508.561.6427
>
> o: 217.255.4262
>
  

[GitHub] gaodayue opened a new pull request #6036: use S3 as a backup storage for hdfs deep storage

2018-07-23 Thread GitBox
gaodayue opened a new pull request #6036: use S3 as a backup storage for hdfs 
deep storage
URL: https://github.com/apache/incubator-druid/pull/6036
 
 
   This PR improves the overall availability of hdfs-deep-storage by pushing 
data to S3 when HDFS is temporarily not available.
   
   # Motivation
   
   In many organization, Hadoop and HDFS are typically used in offline data 
analysis, while Druid targets online data serving. Thus SLA provided by HDFS 
often can't meet the needs of Druid. Consequently, users of hdfs-deep-storage 
often encounter task failures due to temporarily unavailable of HDFS. Task 
failures can cause data re-processing or even data loss depending on whether 
kafka-indexing-service or tranquility are used for realtime ingestion.
   
   # Goal
   
   Make segment handover continue to work even if HDFS is not available.
   
   # Approach taken by this PR
   
   We leverage the S3AFileSystem provided by the HDFS client library to support 
using S3 as a backup storage for HDFS. When we can't push segments or task logs 
to HDFS, we switch to S3 instead. By using S3 as a backup for HDFS, the overall 
availability of hdfs-deep-storage is increased.
   
   For segments pushed to S3, loadSpec is changed to `{"type":"hdfs", 
"path":"s3a://..."}`. Since file access is done with FileSystem abstraction, 
there is no need to change HdfsDataSegmentPuller.
   
   The following new configuration knobs are added to hdfs-deep-storage and 
hdfs task logs, please refer to doc changes in detail
   * druid.storage.useS3Backup
   * druid.storage.backupS3Bucket
   * druid.storage.backupS3BaseKey
   * druid.indexer.logs.useS3Backup
   * druid.indexer.logs.backupS3Bucket
   * druid.indexer.logs.backupS3BaseKey
   
   Besides what's included in this PR, I've also implemented a tool called 
`restore-hdfs-segment` to migrate segments temporarily pushed to S3 back to 
HDFS. This can free up spaces in S3 as well as make all segments reside on HDFS 
eventually. If you like the idea, I can send another PR for the tool later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org