Hi, I was able to run the code examples for loading data and searching. There were really no hiccups. I started digging through the code and will let you know if I have any questions. Thanks
- Rahul On Thu, May 2, 2013 at 11:33 PM, rahul challapalli < [email protected]> wrote: > Hi Aaron, > > I greatly appreciate your detailed response. I will go through the notes, > code and the examples you provided over the weekend and will keep you > posted regarding any issues that I will come across. Once again thank you. > > - Rahul > > > On Thu, May 2, 2013 at 7:45 PM, Aaron McCurry <[email protected]> wrote: > >> Rahul, >> >> I'm glad you were able to get things built and Blur up and running! Good >> questions! Let me see if I can answer them. >> >> 1. I am not able to find the 'blur.*.hostname' properties in the >> blur.properties file, but these are listed in the readme file >> >> The blur-site.properties file overrides the blur-default.properties file >> that can be found in src/blur-util/src/main/resources/ directory. >> >> 2. There seems to be a lot of code. I greatly appreciate if someone can >> give me pointers before I dig through the codebase. Something like an >> architectural overview or a flow explaining how the search query is >> resolved. >> >> Good question. I will explain how a query is executing assuming you are >> running Blur in a clustered environment (controllers + shards). >> >> -1. Client creates a query (BlurQuery) with the generated Thrift objects. >> -2. Client submits the query to one of the controllers by calling the >> query >> method on the Blur service. >> >> Note the easiest way to interact with Thrift in the client is by using >> BlurClient (org.apache.blur.thrift.BlurClient) in the blur-thrift project. >> And you can see it in use here (I just added it, so you might have to >> pull) >> >> >> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-testsuite/src/main/java/org/apache/blur/testsuite/SimpleQueryExample.java;h=df16bd82522bb9083309ac31f849924ba43318be;hb=ff29884ab60262305b99af49fae26575c808b755 >> >> -3. Once the query arrives in the controller, the controller then >> re-submits the query to all the shard servers that are online. >> >> See >> >> src/blur-core/src/main/java/org/apache/blur/thrift/BlurControllerServer.java >> query method. >> >> -4. Once the in shard server the query is then parsed into a Lucene query. >> -5. The query is executed in parallel, one thread per index shard in the >> shard server. >> >> See >> src/blur-core/src/main/java/org/apache/blur/thrift/BlurShardServer.java >> query method. >> >> -6. Once the results have been found from the query they are merged and >> the >> top N are returned to the controller. >> -7. Once a the results from all the shard servers have returned the top N >> are returned to the client. >> >> I know this is a technical explanation to running a single query, but is >> should give you some starting points to dig through the code. >> >> The projects breakdown: >> >> blur-core >> - This project binds most of the other projects together, houses all the >> thrift service impls, failover logic, server startup, shard and controller >> management, etc. >> blur-gui >> - An http status server that runs in each controller and shard server, >> needs some work. >> blur-mapred >> - The bulking indexing code lives in this project. >> blur-query >> - The lucene query classes that blur implements reside here. >> blur-shell >> - A basic shell program to interact with blur, needs some more features. >> blur-store >> - The lucene directory and block cache code resides here. >> blur-testsuite >> - Current contains a lot of example programs to exercise a blur cluster. >> blur-thrift >> - Contains generate thrift code and client code, the client code has >> automatic retry logic for when you are running multiple controllers, etc. >> blur-util >> - Contains some basic utility classes, metrics, and zookeeper code. >> >> >> 3. How do you guys manage your development workspace with eclipse, git, >> and >> maven. This will definitely help me get a kickstart. >> >> I run git on the command line, with mvn and eclipse as my IDE. There are >> some shortcuts runs testing a single shard server, or a shard server + >> controller server from within eclipse. Take a look at the >> org.apache.blur.thrift.ThriftBlurShardServer and >> org.apache.blur.thrift.ThriftBlurControllerServer to the main methods that >> can be executed to run various processes. If you have ZooKeeper running >> you should be able to run those mains and then step through a query being >> executed. >> >> 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the >> steps in actually using it. Where do we start? >> >> Take a look at >> >> http://www.nearinfinity.com/blogs/aaron_mccurry/an_introduction_to_blur.htmlas >> well as the blur-testsuite. That project has some basic programs to >> create a table, load data, search, etc. And please follow up with more >> questions if you need more guidance or help. >> >> Thanks for the notes about the initial setup and build! I will take a >> look >> at the errors. >> >> Aaron >> >> >> On Thu, May 2, 2013 at 1:42 AM, rahul challapalli < >> [email protected]> wrote: >> >> > Hi, >> > >> > I was able to get blur started (shards and controllers). It worked >> straight >> > away. Awesome. I have a few more questions. My apologies if some of the >> > questions are naive. >> > >> > 1. I am not able to find the 'blur.*.hostname' properties in the >> > blur.properties file, but these are listed in the readme file >> > 2. There seems to be a lot of code. I greatly appreciate if someone can >> > give me pointers before I dig through the codebase. Something like an >> > architectural overview or a flow explaining how the search query is >> > resolved. >> > 3. How do you guys manage your development workspace with eclipse, git, >> and >> > maven. This will definitely help me get a kickstart. >> > 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the >> > steps in actually using it. Where do we start? >> > >> > Also I am outlining the steps that I followed in getting blur to run and >> > also I got a couple of errors during the build process which are also >> > listed below. The overall build was successful though. >> > >> > Apache Blur Single Node Setup on Mac OS X Lion >> > >> > 1. Environment : Single Node Hadoop-1.0.4 and Zookeeper-3.4.5 >> > 2. Get the Blur code from Git using git clone >> > https://git-wip-us.apache.org/repos/asf/incubator-blur.git >> > 3. Checkout the branch 0.1.5 >> > 4. Run 'mvn clean install' from the 'src' directory as superuser >> > 5. Extract the Blur tar.gz file from the 'target/' directory into a >> > convenient location and set BLUR_HOME to this location and add it to >> > .bash_profile >> > 6. Go to the extracted folder and configure the >> > $BLUR_HOME/config/blur-env.sh file. The two exports that are required: >> > export JAVA_HOME=$(/usr/libexec/java_home) >> > export HADOOP_HOME=/usr/local/hadoop >> > 7. Setup the $BLUR_HOME/config/blur.properties file. The default site >> > configuration: >> > blur.zookeeper.connection=localhost >> > blur.cluster.name=default >> > 8. Start blur using $BLUR_HOME/bin/start-all.sh >> > >> > Errors during the build process : >> > >> > ERROR 20130430_22:47:42:042_PDT [IndexReader-Refresher] >> > writer.BlurIndexRefresher: Unknown error >> > org.apache.lucene.store.AlreadyClosedException: this Directory is closed >> > at org.apache.lucene.store.Directory.ensureOpen(Directory.java:256) >> > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) >> > at >> > >> > >> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:679) >> > at >> > >> > >> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630) >> > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343) >> > at >> > >> > >> org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:326) >> > at >> > >> > >> org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:284) >> > at >> > >> > >> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:247) >> > at >> > >> > >> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235) >> > at >> > >> > >> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169) >> > at >> > >> > >> org.apache.blur.manager.writer.BlurIndexReader.refresh(BlurIndexReader.java:82) >> > at >> > >> > >> org.apache.blur.manager.writer.BlurIndexRefresher.refreshInternal(BlurIndexRefresher.java:70) >> > at >> > >> > >> org.apache.blur.manager.writer.BlurIndexRefresher.run(BlurIndexRefresher.java:61) >> > at java.util.TimerThread.mainLoop(Timer.java:512) >> > at java.util.TimerThread.run(Timer.java:462) >> > WARN 20130501_20:54:18:018_PDT [main] jmx.MBeanRegistry: Error during >> > unregister >> > javax.management.InstanceNotFoundException: >> > >> > >> org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree >> > at >> > >> > >> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094) >> > at >> > >> > >> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:415) >> > at >> > >> > >> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:403) >> > at >> > >> > >> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:507) >> > at >> > >> org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:115) >> > at >> > >> org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:132) >> > at >> > >> > >> org.apache.zookeeper.server.ZooKeeperServer.unregisterJMX(ZooKeeperServer.java:443) >> > at >> > >> > >> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:436) >> > at >> > >> > >> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:271) >> > at >> > >> > >> org.apache.zookeeper.server.ZooKeeperServerMain.shutdown(ZooKeeperServerMain.java:127) >> > at >> > >> > >> org.apache.blur.MiniCluster$ZooKeeperServerMainEmbedded.shutdown(MiniCluster.java:339) >> > at org.apache.blur.MiniCluster.shutdownZooKeeper(MiniCluster.java:427) >> > at >> org.apache.blur.MiniCluster.shutdownBlurCluster(MiniCluster.java:146) >> > at >> > >> > >> org.apache.blur.thrift.BlurClusterTest.shutdownCluster(BlurClusterTest.java:81) >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > at >> > >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> > at >> > >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> > at java.lang.reflect.Method.invoke(Method.java:597) >> > at >> > >> > >> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) >> > at >> > >> > >> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) >> > at >> > >> > >> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) >> > at >> > >> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37) >> > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) >> > at >> > >> > >> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236) >> > at >> > >> > >> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134) >> > at >> > >> > >> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113) >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > at >> > >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> > at >> > >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> > at java.lang.reflect.Method.invoke(Method.java:597) >> > at >> > >> > >> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) >> > at >> > >> > >> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) >> > at >> > >> > >> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) >> > at >> > >> > >> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103) >> > at >> > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74) >> > >> > >> > - Rahul >> > >> > >> > >> > >> > On Tue, Apr 30, 2013 at 9:29 PM, rahul challapalli < >> > [email protected]> wrote: >> > >> > > Aaron, >> > > >> > > Thanks for your reply. I will sure let you know how it goes. >> > > >> > > - Rahul >> > > >> > > >> > > On Tue, Apr 30, 2013 at 7:33 PM, Aaron McCurry <[email protected]> >> > wrote: >> > > >> > >> Hi Rahul, >> > >> >> > >> Welcome! Blur is a young incubator project and with that there is >> not a >> > >> lot of documentation. Yet. But we do have a lot of code. :-) >> > >> >> > >> Blur uses HDFS for storing indexes, MapReduce for bulk indexing, >> Thrift >> > >> for >> > >> RPC and ZooKeeper for state, and of course Lucene for search. Yes >> Blur >> > >> can >> > >> and should run along side a standard Hadoop install (MapReduce + >> HDFS). >> > >> It >> > >> currently works with the 1.0.x version or CDH3 from Cloudera. I'm >> sure >> > we >> > >> can get it to work with 2.0.x and CDH4, it just hasn't happen yet. >> > >> However >> > >> the only dependency to run Blur on a single machine is ZooKeeper. >> HDFS >> > is >> > >> required for a cluster. >> > >> >> > >> To get you started. >> > >> >> > >> git clone https://git-wip-us.apache.org/repos/asf/incubator-blur.git >> > >> >> > >> # we are currently focusing on getting 0.1.5 to a releasable state. >> > >> git checkout 0.1.5 >> > >> >> > >> In the checkout you will find a README.md that is a bit out of date >> with >> > >> the code examples but the general theme is correct. For more >> examples >> > >> take >> > >> a look at the blur-testsuite project, there are a lot of code >> examples >> > in >> > >> there to get you started. >> > >> >> > >> To build the project into a tarball that can be extracted and >> executed. >> > >> >> > >> run "mvn install" from the src/ directory. Once it has successfully >> > >> executed all the tests and built everything you will find a tar.gz >> file >> > in >> > >> the target/ directory in the distribution project. >> > >> >> > >> Before you can run Blur, Apache ZooKeeper needs to be running. A >> > default >> > >> install will work. >> > >> >> > >> After extracting the Blur tar.gz file you should be able to run the >> > >> bin/start-all.sh and it should start a Blur controller and a shard >> > server >> > >> on your local machine. >> > >> >> > >> I would love to hear how your initial compile and install goes, >> because >> > we >> > >> could use this thread and any information that is exchanged to >> create a >> > >> nice little wiki page for 0.1.5. >> > >> >> > >> Thank! >> > >> >> > >> Aaron >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> On Tue, Apr 30, 2013 at 2:17 PM, rahul challapalli < >> > >> [email protected]> wrote: >> > >> >> > >> > Hi, >> > >> > >> > >> > I am new to blur and even ASF in terms of contributing back to a >> > >> project. I >> > >> > have decent knowledge about hadoop and mapreduce but completely >> new to >> > >> > search. I come from a Java/PHP background. I am looking for some >> > >> direction >> > >> > in setting up blur on my local machine. I have a single node hadoop >> > >> > installation on my Mac OS X Lion. Is it an issue if I have HDFS, >> > >> MapReduce >> > >> > daemons running alongside blur on the same machine. I would greatly >> > >> > appreciate if you can refer me to some setup document as well as an >> > >> insight >> > >> > into the architecture of blur. Thank You. >> > >> > >> > >> > - Rahul >> > >> > >> > >> >> > > >> > > >> > >> > >
