[GitHub] gaodayue commented on issue #6036: use S3 as a backup storage for hdfs deep storage
gaodayue commented on issue #6036: use S3 as a backup storage for hdfs deep storage URL: https://github.com/apache/incubator-druid/pull/6036#issuecomment-407288499 Hi @jihoonson , thanks for your comments. Answering your question > I think, if this is the case, you might need to somehow increase write throughput of your HDFS or use a separate deep storage. If the first option is not available for you, does it make sense to use only S3 as your deep storage? Our company operates its own big hadoop cluster (>5k nodes) for us to use. Switching to s3-deep-storage requires extra cost and is not an option for us. > Maybe we need to define the concept of backup deep storage for all deep storage types and support it. I've thought about implementing something like composite-deep-storage which can add backup abilities to all deep storages at first, but found it's non-trivial to load multiple deep storage extensions inside composite-deep-storage. So I decide to add support hdfs-deep-storage only just because we're using it. > Maybe the primary deep storage and backup deep storage should be in sync automatically. What do you mean by "in sync"? Do you mean all segments pushed to backup storage should be copied back to primary storage eventually? If that's the case, I don't think there is a strong need for it (explained below). > But, this PR is restricted to support it for only HDFS deep storage and looks to require another tool, called restore-hdfs-segment, to keep all segments to reside in HDFS. This would need additional operations which make Druid operation difficult. First, the restore-hdfs-segment tool is not required to achieve the goal of hdfs fault tolerant. I developed it for other reasons. One is to pay less for S3 and the other is that we occasionally need to migrate datasource from one cluster to another, and we want all segments reside on hdfs so that we can simply use the insert-segment-to-db tool to migrate all segments. If other users don't have the same concern, they can simply ignore restore-hdfs-segment. Second, concerning operation complexity, I think it's just a trade-off made between availability and cost. And the extra operational cost is as low as run restore-hdfs-segment manually after a hdfs failure or set up a daily crontab to run restore-hdfs-segment. > Kafka indexing service guarantees exactly-once data ingestion, and thus data loss is never expected to happen. If deep storage is not available, all attempts to publish segments would fail and every task should restart from the same offset when publishing failed. Yeah I'm aware of it. But for other reasons we are still using tranquility as the main ingestion tool and hdfs failure do cause data loss several times and it's a big pain for us. We've added this feature to solve the problem, and I think maybe it's also useful for other people. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
[GitHub] gianm commented on issue #6037: snapshot quickstart error
gianm commented on issue #6037: snapshot quickstart error URL: https://github.com/apache/incubator-druid/issues/6037#issuecomment-407269663 I have seen this happen with older versions of javac; perhaps try upgrading that (looks like your jdk is 1.8.0_60). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
Re: Release process for Maven artifacts?
Hi Joseph, For official releases we just do mvn release:prepare release:perform. The poms encode everything that has to happen, including sources jars, javadocs jars, signing everything, and pushing it up to Maven Central. You might be able to modify them to push to a different repo. On Mon, Jul 23, 2018 at 5:06 PM Joseph Glanville wrote: > Hi, > > Is the release process for publishing Maven artifacts documented somewhere? > We have been building tar archives with `mvn package` successfully but we > would like to publish our own Maven artifacts also, including `-sources` > JARs so we can add them as dependencies to projects outside of the Druid > source tree. > > Joseph. >
[GitHub] AlexanderSaydakov opened a new issue #6037: snapshot quickstart error
AlexanderSaydakov opened a new issue #6037: snapshot quickstart error URL: https://github.com/apache/incubator-druid/issues/6037 I am trying to follow the quick start guide using the snapshot from the master. http://druid.io/docs/latest/tutorials/quickstart.html All services appear to start, wiki data indexing succeeds, but never becomes fully available (red dot in the console view). By looking at the logs, the following error in the log/server-coordinator.log seems to be relevant: ` 2018-07-24T00:47:29,773 ERROR [LeaderSelector[/druid/coordinator/_COORDINATOR]] org.apache.curator.framework.listen.ListenerContainer - Listener (io.druid.curator.discovery.CuratorDruidLeaderSelector$1@7e15f4d4) threw an exception java.lang.VerifyError: Bad type on operand stack Exception Details: Location: io/druid/server/coordinator/DruidCoordinator$CoordinatorHistoricalManagerRunnable.(Lio/druid/server/coordinator/DruidCoordinator;I)V @16: invokedynamic Reason: Type uninitializedThis (current frame, stack[3]) is not assignable to 'io/druid/server/coordinator/DruidCoordinator$CoordinatorHistoricalManagerRunnable' Current Frame: bci: @16 flags: { flagThisUninit } locals: { uninitializedThis, 'io/druid/server/coordinator/DruidCoordinator', integer } stack: { uninitializedThis, 'io/druid/server/coordinator/DruidCoordinator', 'io/druid/server/coordinator/helper/DruidCoordinatorSegmentInfoLoader', uninitializedThis } Bytecode: 0x000: 2a2b b500 012a 2bbb 0002 592b b700 032a 0x010: ba00 0400 00bb 0005 592b b700 06bb 0007 0x020: 592b b700 08bb 0009 592b b700 0abb 000b 0x030: 592b b700 0cbb 000d 592b b700 0eb8 000f 0x040: 1cb7 0010 b1 at io.druid.server.coordinator.DruidCoordinator.becomeLeader(DruidCoordinator.java:540) ~[druid-server-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT] at io.druid.server.coordinator.DruidCoordinator.access$000(DruidCoordinator.java:96) ~[druid-server-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT] at io.druid.server.coordinator.DruidCoordinator$1.becomeLeader(DruidCoordinator.java:495) ~[druid-server-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT] at io.druid.curator.discovery.CuratorDruidLeaderSelector$1.isLeader(CuratorDruidLeaderSelector.java:98) ~[druid-server-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT] at org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:665) ~[curator-recipes-4.0.0.jar:4.0.0] at org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:661) ~[curator-recipes-4.0.0.jar:4.0.0] at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-4.0.0.jar:4.0.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_60] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60] ` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
Release process for Maven artifacts?
Hi, Is the release process for publishing Maven artifacts documented somewhere? We have been building tar archives with `mvn package` successfully but we would like to publish our own Maven artifacts also, including `-sources` JARs so we can add them as dependencies to projects outside of the Druid source tree. Joseph.
[GitHub] jihoonson commented on issue #6028: Error in SqlMetadataRuleManagerTest
jihoonson commented on issue #6028: Error in SqlMetadataRuleManagerTest URL: https://github.com/apache/incubator-druid/issues/6028#issuecomment-407166896 Thanks @leventov. Probably `SQLMetadataSegmentManager` has the same bug. I'll check it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
[GitHub] jihoonson commented on a change in pull request #6033: Synchronize scheduled poll() calls in SQLMetadataRuleManager to prevent flakiness in SqlMetadataRuleManagerTest
jihoonson commented on a change in pull request #6033: Synchronize scheduled poll() calls in SQLMetadataRuleManager to prevent flakiness in SqlMetadataRuleManagerTest URL: https://github.com/apache/incubator-druid/pull/6033#discussion_r204518835 ## File path: server/src/main/java/io/druid/metadata/SQLMetadataRuleManager.java ## @@ -142,13 +139,19 @@ public Void withHandle(Handle handle) throws Exception private final AuditManager auditManager; private final Object lock = new Object(); - - private volatile boolean started = false; - - private volatile ListeningScheduledExecutorService exec = null; - private volatile ListenableFuture future = null; - - private volatile long retryStartTime = 0; + /** The number of times this SQLMetadataRuleManager was started. */ + private long startCount = 0; + /** + * Equal to the current {@link #startCount} value, if the SQLMetadataRuleManager is currently started; -1 if + * currently stopped. + * + * This field is used to implement a simple stamp mechanism instead of just a boolean "started" flag to prevent + * the theoretical situation of two tasks scheduled in {@link #start()} calling {@link #poll()} concurrently, if + * the sequence of {@link #start()} - {@link #stop()} - {@link #start()} actions occurs quickly. + */ + private long currentStartOrder = -1; + private ScheduledExecutorService exec = null; + private long retryStartTime = 0; Review comment: nit: looks that this can be a local variable. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
[GitHub] jihoonson commented on issue #6036: use S3 as a backup storage for hdfs deep storage
jihoonson commented on issue #6036: use S3 as a backup storage for hdfs deep storage URL: https://github.com/apache/incubator-druid/pull/6036#issuecomment-407161433 Hi @gaodayue, thanks for the PR. I have a question. > In many organization, Hadoop and HDFS are typically used in offline data analysis, while Druid is targeting online data serving. Thus SLA provided by HDFS often can't meet the needs of Druid. - I think, if this is the case, you might need to somehow increase write throughput of your HDFS or use a separate deep storage. If the first option is not available for you, does it make sense to use only S3 as your deep storage? For the idea of this PR, I'm not sure it is a good idea. Maybe we need to define the concept of backup deep storage for all deep storage types and support it. Maybe the primary deep storage and backup deep storage should be in sync automatically. But, this PR is restricted to support it for only HDFS deep storage and looks to require another tool, called `restore-hdfs-segment`, to keep all segments to reside in HDFS. This would need additional operations which make Druid operation difficult. Side comment: Kafka indexing service guarantees exactly-once data ingestion, and thus data loss is never expected to happen. If deep storage is not available, all attempts to publish segments would fail and every task should restart from the same offset when publishing failed. This needs reprocessing the same data which can make the ingestion slow, but there should be no data loss or data duplication. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
Re: Build failure on 0.13.SNAPSHOT
I'm also using Maven 3.5.2 and not using any special configurations for Maven, but I have never seen that error too. Most of our Travis jobs have been working with only 512 MB of direct memory. Only the 'strict compilation' Travis job requires 3 GB of memory. I think it's worthwhile to look into this more. Maybe we somehow use more memory when we run all tests by 'mvn install'. Maybe this relates to the frequent transient failures of 'processing module test', one of our Travis jobs. Jihoon On Mon, Jul 23, 2018 at 9:32 AM Gian Merlino wrote: > Interesting. Fwiw, I am using Maven 3.5.2 for building Druid and it has > been working for for me. I don't think I"m using any special Maven > overrides (at least, I don't see anything interesting in my ~/.m2 directory > or in my environment variables). It might have to do with how much memory > our machines have? I do most of my builds on a Mac with 16GB RAM. Maybe try > checking .travis.yml in the druid repo. It sets -Xmx3000m for mvn install > commands, which might be needed for more low memory environments. > > $ mvn --version > Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; > 2017-10-18T00:58:13-07:00) > Maven home: /usr/local/Cellar/maven/3.5.2/libexec > Java version: 1.8.0_161, vendor: Oracle Corporation > Java home: > /Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "mac os x", version: "10.13.5", arch: "x86_64", family: "mac" > > On Mon, Jul 23, 2018 at 6:40 AM Dongjin Lee wrote: > > > Finally, it seems like I found the reason. It was a composition of > several > > problems: > > > > - Druid should not be built with maven 3.5.x. With 3.5.2, Test suites > like > > `GroupByQueryRunnerFailureTest` fails. After I switched into 3.3.9 which > is > > built in the latest version of IntelliJ, those errors disappeared. It > seems > > like maven 3.5.x is not stable yet - it applied a drastic change, and it > is > > also why they skipped 3.4.x. > > - It seems like Druid requires some MaxDirectMemorySize configuration for > > some test suites. With some JVM parameter like > `-XX:MaxDirectMemorySize=4g` > > some test suites were passed, but not all. I am now trying the other > > options with enlarged swap space. > > > > Question: How much MaxDirectMemorySize configuration are you using? > > > > Best, > > Dongjin > > > > On Sat, Jul 21, 2018 at 3:01 AM Jihoon Son wrote: > > > > > Hi Dongjin, > > > > > > that is weird. It looks like the vm crashed because of out of memory > > while > > > testing. > > > It might be a real issue or not. > > > Have you set any memory configuration for your maven? > > > > > > Jihoon > > > > > > On Thu, Jul 19, 2018 at 7:09 PM Dongjin Lee > wrote: > > > > > > > Hi Jihoon, > > > > > > > > I ran `mvn clean package` following development/build > > > > < > > > > > > > > > > https://github.com/apache/incubator-druid/blob/master/docs/content/development/build.md > > > > > > > > > . > > > > > > > > Dongjin > > > > > > > > On Fri, Jul 20, 2018 at 12:30 AM Jihoon Son > > > wrote: > > > > > > > > > Hi Dongjin, > > > > > > > > > > what maven command did you run? > > > > > > > > > > Jihoon > > > > > > > > > > On Wed, Jul 18, 2018 at 10:38 PM Dongjin Lee > > > wrote: > > > > > > > > > > > Hello. I am trying to build druid, but it fails. My environment > is > > > like > > > > > the > > > > > > following: > > > > > > > > > > > > - CPU: Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz > > > > > > - RAM: 7704 MB > > > > > > - OS: ubuntu 18.04 > > > > > > - JDK: openjdk version "1.8.0_171" (default configuration, with > > > > > MaxHeapSize > > > > > > = 1928 MB) > > > > > > - Branch: master (commit: cd8ea3d) > > > > > > > > > > > > The error message I got is: > > > > > > > > > > > > [INFO] > > > > > > > > > > > > > > > > > > > > > > > [INFO] Reactor Summary: > > > > > > > [INFO] > > > > > > > [INFO] io.druid:druid . > > > SUCCESS [ > > > > > > > 50.258 s] > > > > > > > [INFO] java-util .. > > SUCCESS > > > > > > [03:57 > > > > > > > min] > > > > > > > [INFO] druid-api .. > > > SUCCESS [ > > > > > > > 22.694 s] > > > > > > > [INFO] druid-common ... > > > SUCCESS [ > > > > > > > 14.083 s] > > > > > > > [INFO] druid-hll .. > > > SUCCESS [ > > > > > > > 17.126 s] > > > > > > > [INFO] extendedset > > > SUCCESS [ > > > > > > > 10.856 s] > > > > > > > > > > > > > > *[INFO] druid-processing ... > > > FAILURE > > > > > > > [04:36 min]*[INFO] druid-aws-common > > > > ... > > > > > > > SKIPPED > > > > > > > [INFO] druid-server ... > > SKIPPED > > > > > > > [INFO] druid-examples ...
[GitHub] RestfulBlue commented on issue #5006: No protections for select query
RestfulBlue commented on issue #5006: No protections for select query URL: https://github.com/apache/incubator-druid/issues/5006#issuecomment-407127405 Any updates? Druid still can be killed by OOM using simple select query. of course I can use restrictions, but not all users understand what they are doing and the fact that users can very simply drop the entire cluster does not please me. Is it really hard to implement restriction for select query in configuration? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
Re: Build failure on 0.13.SNAPSHOT
Interesting. Fwiw, I am using Maven 3.5.2 for building Druid and it has been working for for me. I don't think I"m using any special Maven overrides (at least, I don't see anything interesting in my ~/.m2 directory or in my environment variables). It might have to do with how much memory our machines have? I do most of my builds on a Mac with 16GB RAM. Maybe try checking .travis.yml in the druid repo. It sets -Xmx3000m for mvn install commands, which might be needed for more low memory environments. $ mvn --version Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T00:58:13-07:00) Maven home: /usr/local/Cellar/maven/3.5.2/libexec Java version: 1.8.0_161, vendor: Oracle Corporation Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre Default locale: en_US, platform encoding: UTF-8 OS name: "mac os x", version: "10.13.5", arch: "x86_64", family: "mac" On Mon, Jul 23, 2018 at 6:40 AM Dongjin Lee wrote: > Finally, it seems like I found the reason. It was a composition of several > problems: > > - Druid should not be built with maven 3.5.x. With 3.5.2, Test suites like > `GroupByQueryRunnerFailureTest` fails. After I switched into 3.3.9 which is > built in the latest version of IntelliJ, those errors disappeared. It seems > like maven 3.5.x is not stable yet - it applied a drastic change, and it is > also why they skipped 3.4.x. > - It seems like Druid requires some MaxDirectMemorySize configuration for > some test suites. With some JVM parameter like `-XX:MaxDirectMemorySize=4g` > some test suites were passed, but not all. I am now trying the other > options with enlarged swap space. > > Question: How much MaxDirectMemorySize configuration are you using? > > Best, > Dongjin > > On Sat, Jul 21, 2018 at 3:01 AM Jihoon Son wrote: > > > Hi Dongjin, > > > > that is weird. It looks like the vm crashed because of out of memory > while > > testing. > > It might be a real issue or not. > > Have you set any memory configuration for your maven? > > > > Jihoon > > > > On Thu, Jul 19, 2018 at 7:09 PM Dongjin Lee wrote: > > > > > Hi Jihoon, > > > > > > I ran `mvn clean package` following development/build > > > < > > > > > > https://github.com/apache/incubator-druid/blob/master/docs/content/development/build.md > > > > > > > . > > > > > > Dongjin > > > > > > On Fri, Jul 20, 2018 at 12:30 AM Jihoon Son > > wrote: > > > > > > > Hi Dongjin, > > > > > > > > what maven command did you run? > > > > > > > > Jihoon > > > > > > > > On Wed, Jul 18, 2018 at 10:38 PM Dongjin Lee > > wrote: > > > > > > > > > Hello. I am trying to build druid, but it fails. My environment is > > like > > > > the > > > > > following: > > > > > > > > > > - CPU: Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz > > > > > - RAM: 7704 MB > > > > > - OS: ubuntu 18.04 > > > > > - JDK: openjdk version "1.8.0_171" (default configuration, with > > > > MaxHeapSize > > > > > = 1928 MB) > > > > > - Branch: master (commit: cd8ea3d) > > > > > > > > > > The error message I got is: > > > > > > > > > > [INFO] > > > > > > > > > > > > > > > > > > [INFO] Reactor Summary: > > > > > > [INFO] > > > > > > [INFO] io.druid:druid . > > SUCCESS [ > > > > > > 50.258 s] > > > > > > [INFO] java-util .. > SUCCESS > > > > > [03:57 > > > > > > min] > > > > > > [INFO] druid-api .. > > SUCCESS [ > > > > > > 22.694 s] > > > > > > [INFO] druid-common ... > > SUCCESS [ > > > > > > 14.083 s] > > > > > > [INFO] druid-hll .. > > SUCCESS [ > > > > > > 17.126 s] > > > > > > [INFO] extendedset > > SUCCESS [ > > > > > > 10.856 s] > > > > > > > > > > > > *[INFO] druid-processing ... > > FAILURE > > > > > > [04:36 min]*[INFO] druid-aws-common > > > ... > > > > > > SKIPPED > > > > > > [INFO] druid-server ... > SKIPPED > > > > > > [INFO] druid-examples . > SKIPPED > > > > > > ... > > > > > > [INFO] > > > > > > > > > > > > > > > > > > [INFO] BUILD FAILURE > > > > > > [INFO] > > > > > > > > > > > > > > > > > > [INFO] Total time: 10:29 min > > > > > > [INFO] Finished at: 2018-07-19T13:23:31+09:00 > > > > > > [INFO] Final Memory: 88M/777M > > > > > > [INFO] > > > > > > > > > > > > > > > > > > > > > > > > *[ERROR] Failed to execute goal > > > > > > org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test > > > > (default-test) > > > > > > on project druid-processing: Execution default-test of goal > > > > > > or
Re: Build failure on 0.13.SNAPSHOT
Finally, it seems like I found the reason. It was a composition of several problems: - Druid should not be built with maven 3.5.x. With 3.5.2, Test suites like `GroupByQueryRunnerFailureTest` fails. After I switched into 3.3.9 which is built in the latest version of IntelliJ, those errors disappeared. It seems like maven 3.5.x is not stable yet - it applied a drastic change, and it is also why they skipped 3.4.x. - It seems like Druid requires some MaxDirectMemorySize configuration for some test suites. With some JVM parameter like `-XX:MaxDirectMemorySize=4g` some test suites were passed, but not all. I am now trying the other options with enlarged swap space. Question: How much MaxDirectMemorySize configuration are you using? Best, Dongjin On Sat, Jul 21, 2018 at 3:01 AM Jihoon Son wrote: > Hi Dongjin, > > that is weird. It looks like the vm crashed because of out of memory while > testing. > It might be a real issue or not. > Have you set any memory configuration for your maven? > > Jihoon > > On Thu, Jul 19, 2018 at 7:09 PM Dongjin Lee wrote: > > > Hi Jihoon, > > > > I ran `mvn clean package` following development/build > > < > > > https://github.com/apache/incubator-druid/blob/master/docs/content/development/build.md > > > > > . > > > > Dongjin > > > > On Fri, Jul 20, 2018 at 12:30 AM Jihoon Son > wrote: > > > > > Hi Dongjin, > > > > > > what maven command did you run? > > > > > > Jihoon > > > > > > On Wed, Jul 18, 2018 at 10:38 PM Dongjin Lee > wrote: > > > > > > > Hello. I am trying to build druid, but it fails. My environment is > like > > > the > > > > following: > > > > > > > > - CPU: Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz > > > > - RAM: 7704 MB > > > > - OS: ubuntu 18.04 > > > > - JDK: openjdk version "1.8.0_171" (default configuration, with > > > MaxHeapSize > > > > = 1928 MB) > > > > - Branch: master (commit: cd8ea3d) > > > > > > > > The error message I got is: > > > > > > > > [INFO] > > > > > > > > > > > > > > [INFO] Reactor Summary: > > > > > [INFO] > > > > > [INFO] io.druid:druid . > SUCCESS [ > > > > > 50.258 s] > > > > > [INFO] java-util .. SUCCESS > > > > [03:57 > > > > > min] > > > > > [INFO] druid-api .. > SUCCESS [ > > > > > 22.694 s] > > > > > [INFO] druid-common ... > SUCCESS [ > > > > > 14.083 s] > > > > > [INFO] druid-hll .. > SUCCESS [ > > > > > 17.126 s] > > > > > [INFO] extendedset > SUCCESS [ > > > > > 10.856 s] > > > > > > > > > > *[INFO] druid-processing ... > FAILURE > > > > > [04:36 min]*[INFO] druid-aws-common > > ... > > > > > SKIPPED > > > > > [INFO] druid-server ... SKIPPED > > > > > [INFO] druid-examples . SKIPPED > > > > > ... > > > > > [INFO] > > > > > > > > > > > > > > [INFO] BUILD FAILURE > > > > > [INFO] > > > > > > > > > > > > > > [INFO] Total time: 10:29 min > > > > > [INFO] Finished at: 2018-07-19T13:23:31+09:00 > > > > > [INFO] Final Memory: 88M/777M > > > > > [INFO] > > > > > > > > > > > > > > > > > > > *[ERROR] Failed to execute goal > > > > > org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test > > > (default-test) > > > > > on project druid-processing: Execution default-test of goal > > > > > org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test failed: > > The > > > > > forked VM terminated without properly saying goodbye. VM crash or > > > > > System.exit called?*[ERROR] Command was /bin/sh -c cd > > > > > /home/djlee/workspace/java/druid/processing && > > > > > /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xmx3000m > > > > -Duser.language=en > > > > > -Duser.country=US -Dfile.encoding=UTF-8 -Duser.timezone=UTC > > > > > -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager > > > > > -Ddruid.indexing.doubleStorage=double -jar > > > > > > > > > > > > > > > /home/djlee/workspace/java/druid/processing/target/surefire/surefirebooter1075382243904099051.jar > > > > > > > > > > > > > > > /home/djlee/workspace/java/druid/processing/target/surefire/surefire559351134757209tmp > > > > > > > > > > > > > > > /home/djlee/workspace/java/druid/processing/target/surefire/surefire_5173894389718744688tmp > > > > > > > > > > > > It seems like it fails when it runs tests on `druid-processing` > module > > > but > > > > I can't certain. Is there anyone who can give me some hints? Thanks > in > > > > advance. > > > > > > > > Best, > > > > Dongjin > > > > > > > > -- > > > > *Dongjin Lee*
[GitHub] ashukhira commented on issue #6025: Druid Query Error
ashukhira commented on issue #6025: Druid Query Error URL: https://github.com/apache/incubator-druid/issues/6025#issuecomment-407011699 Hi , On this druid forum somebody , provide solution for above druid query error . Removing limitSpec Column node query gives proper result. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
Re: synchronization question about datasketches aggregator
Hi Guys, Just wanted to pay your attention that once OakIncrementalIndex will be in place there is no need to manage the issue of synchronization between aggregators and ingestions. Part of Oak benefits is the synchronization for the simultaneous writes and reads of the same key in the map. On Thursday, July 19, 2018, 10:17:18 PM GMT+3, Gian Merlino wrote: Hi Will, Check out also this thread for related discussion: https://lists.apache.org/thread.html/9899aa790a7eb561ab66f47b35c8f66ffe695432719251351339521a@%3Cdev.druid.apache.org%3E On Thu, Jul 19, 2018 at 11:21 AM Will Lauer wrote: > A colleague recently pointed out to me that all the sketch operations that > take place in SketchAggregator (in the datasketches module) use a > SychronizedUnion class that basically wraps a normal sketch Union and > synchronizes all operations. From what I can tell with other aggregators in > the Druid code base, there doesn't appear to be a need to synchronize. It > looks like Aggregators are always processed from within a single thread. Is > it reasonable to remove all the syncrhonizations from the SketchAggregator > and avoid the performance hit that they impose at runtime? > > Will > > Will Lauer > Senior Principal Architect > > Progress requires pain > > m: 508.561.6427 > > o: 217.255.4262 >
[GitHub] gaodayue opened a new pull request #6036: use S3 as a backup storage for hdfs deep storage
gaodayue opened a new pull request #6036: use S3 as a backup storage for hdfs deep storage URL: https://github.com/apache/incubator-druid/pull/6036 This PR improves the overall availability of hdfs-deep-storage by pushing data to S3 when HDFS is temporarily not available. # Motivation In many organization, Hadoop and HDFS are typically used in offline data analysis, while Druid targets online data serving. Thus SLA provided by HDFS often can't meet the needs of Druid. Consequently, users of hdfs-deep-storage often encounter task failures due to temporarily unavailable of HDFS. Task failures can cause data re-processing or even data loss depending on whether kafka-indexing-service or tranquility are used for realtime ingestion. # Goal Make segment handover continue to work even if HDFS is not available. # Approach taken by this PR We leverage the S3AFileSystem provided by the HDFS client library to support using S3 as a backup storage for HDFS. When we can't push segments or task logs to HDFS, we switch to S3 instead. By using S3 as a backup for HDFS, the overall availability of hdfs-deep-storage is increased. For segments pushed to S3, loadSpec is changed to `{"type":"hdfs", "path":"s3a://..."}`. Since file access is done with FileSystem abstraction, there is no need to change HdfsDataSegmentPuller. The following new configuration knobs are added to hdfs-deep-storage and hdfs task logs, please refer to doc changes in detail * druid.storage.useS3Backup * druid.storage.backupS3Bucket * druid.storage.backupS3BaseKey * druid.indexer.logs.useS3Backup * druid.indexer.logs.backupS3Bucket * druid.indexer.logs.backupS3BaseKey Besides what's included in this PR, I've also implemented a tool called `restore-hdfs-segment` to migrate segments temporarily pushed to S3 back to HDFS. This can free up spaces in S3 as well as make all segments reside on HDFS eventually. If you like the idea, I can send another PR for the tool later. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org