Code looks okay, so it might be just the full volume that is in the way Jörg
On Tue, Sep 9, 2014 at 8:44 PM, Joshua P <jpetersen...@gmail.com> wrote: > This is the code I've been using to index: > > I'm going to try to fix the running out of space issue and then try > slimming down settings. Thank you. > > public class Indexer { > > private static final Logger logger = LogManager.getLogger( > "ESBulkUploader"); > > public static void main(String[] args) throws IOException, > NoSuchFieldException { > > DBConnection dbConn = new DBConnection(""); > > String query = "SELECT TOP 300000 * FROM vw_PropertyGeneralInfo > WHERE Country_id = 1 ORDER BY Property_id DESC"; > > System.out.println("getting data"); > List<PropertyGeneralInfoRow> pgiTable = dbConn. > ExecuteQueryWithoutParameters(query); > System.out.println("got data"); > > ObjectMapper mapper = new ObjectMapper(); > > Settings settings = ImmutableSettings.settingsBuilder().put(" > cluster.name", "property_transaction_data").build(); > > Client client = new TransportClient(settings).addTransportAddress( > new InetSocketTransportAddress("192.168.133.131", 9300)); > > BulkProcessor bulkProcessor = BulkProcessor.builder(client, new > BulkProcessor.Listener() { > @Override > public void beforeBulk(long executionId, BulkRequest request) > { > System.out.println("About to index " + request. > numberOfActions() + " records of size " + request.estimatedSizeInBytes() + > "."); > } > > @Override > public void afterBulk(long executionId, BulkRequest request, > BulkResponse response) { > if( response.hasFailures() ){ > for( BulkItemResponse item : response.getItems() ){ > BulkItemResponse.Failure failure = item.getFailure > (); > if( failure != null ){ > System.out.println(failure.getId() + " -- " + > failure.getStatus().name() + " -- " + failure.getMessage() + " -- " + > failure.getType()); > } > } > } > > System.out.println("Successfully indexed " + request. > numberOfActions() + " records in " + response.getTook() + "."); > } > > @Override > public void afterBulk(long executionId, BulkRequest request, > Throwable failure) { > System.out.println("failure somewhere on " + request. > toString()); > failure.printStackTrace(); > logger.warn("failure on " + request.toString()); > } > }).setBulkActions(500).setConcurrentRequests(1).build(); > > for( int i = 0; i < pgiTable.size(); i++ ){ > //prep location field > PropertyGeneralInfoRow pgiRow = pgiTable.get(i); > > Double[] location = {pgiRow.getLon_dbl(), pgiRow.getLat_dbl > ()}; > > geocode geocode = new geocode(); > > geocode.setLocation(location); > > pgiRow.setGeocode(geocode); > > // prep full address string > pgiRow.setFulladdressstring(pgiRow.getPropertykey_tx() + ", " > + > pgiRow.getCity_tx() + ", " + pgiRow.getStateprov_cd() > + > ", " + pgiRow.getCountry_tx() + ", " + pgiRow. > getPostalcode_tx()); > > String jsonRow = mapper.writeValueAsString(pgiRow); > > if( jsonRow != null && !jsonRow.isEmpty() && !jsonRow.equals( > "{}") ){ > bulkProcessor.add(new IndexRequest("rcapropertydata", > "rcaproperty").source(jsonRow.getBytes())); > // > bulkProcessor.add(client.prepareIndex("rcapropertydata", > "rcaproperty").setSource(jsonRow)); > } > else{ > // don't add null strings.. > try{ > System.out.println(pgiRow.toString()); > } > catch (Exception e){ > System.out.println("Some error in the toString() > method..."); > } > System.out.println("Some json output was null. -- " + > pgiRow.getProperty_id().toString()); > } > > } > > bulkProcessor.flush(); > bulkProcessor.close(); > > } > > > > } > > > > On Tuesday, September 9, 2014 1:57:54 PM UTC-4, Jörg Prante wrote: >> >> Check the path.data setting in config/elasticsearch.yml >> >> Jörg >> >> On Tue, Sep 9, 2014 at 7:50 PM, Joshua P <jpeter...@gmail.com> wrote: >> >>> Just reran the indexer and found this error coming up. I'm running out >>> of disk space on the partition ES wants to write to. >>> >>> F38KqHhnRDWtiJCss5Wz0g -- INTERNAL_SERVER_ERROR -- >>> TranslogException[[index_type][0] Failed to write operation >>> [org.elasticsearch.index.translog.Translog$Create@6f1f6b1e]]; nested: >>> IOException[No space left on device]; -- index_type >>> >>> Where would I change the write location? Which config file? >>> >>> On Tuesday, September 9, 2014 1:28:21 PM UTC-4, Joshua P wrote: >>>> >>>> Hi Jörg, >>>> >>>> Can you elaborate on what you mean by I still need more fine tuning? >>>> >>>> I've upped the heap size to 4g (in both places I mentioned before >>>> because it's not clear to me which one ES actually uses). I haven't tried >>>> to index again yet. >>>> Other than throttling my indexing, what are some other things I need to >>>> be thinking about? >>>> >>>> On Tuesday, September 9, 2014 12:53:35 PM UTC-4, Jörg Prante wrote: >>>>> >>>>> Let ES_HEAP_SIZE at least to 1 GB, for smaller heaps like 512m and >>>>> indexing around 1 million docs, you need some more fine tuning, which is >>>>> complicated. Your machine is ok to set the heap to 4 GB which is 50% of 8 >>>>> GB RAM. >>>>> >>>>> Jörg >>>>> >>>>> On Tue, Sep 9, 2014 at 5:39 PM, Joshua P <jpeter...@gmail.com> wrote: >>>>> >>>>>> Here is /etc/default/elasticsearch >>>>>> >>>>>> # Run Elasticsearch as this user ID and group ID >>>>>> #ES_USER=elasticsearch >>>>>> #ES_GROUP=elasticsearch >>>>>> >>>>>> # Heap Size (defaults to 256m min, 1g max) >>>>>> ES_HEAP_SIZE=512m >>>>>> >>>>>> # Heap new generation >>>>>> #ES_HEAP_NEWSIZE= >>>>>> >>>>>> # max direct memory >>>>>> #ES_DIRECT_SIZE= >>>>>> >>>>>> # Maximum number of open files, defaults to 65535. >>>>>> MAX_OPEN_FILES=65535 >>>>>> >>>>>> # Maximum locked memory size. Set to "unlimited" if you use the >>>>>> # bootstrap.mlockall option in elasticsearch.yml. You must also set >>>>>> # ES_HEAP_SIZE. >>>>>> MAX_LOCKED_MEMORY=unlimited >>>>>> >>>>>> # Maximum number of VMA (Virtual Memory Areas) a process can own >>>>>> #MAX_MAP_COUNT=262144 >>>>>> >>>>>> # Elasticsearch log directory >>>>>> #LOG_DIR=/var/log/elasticsearch >>>>>> >>>>>> # Elasticsearch data directory >>>>>> #DATA_DIR=/var/lib/elasticsearch >>>>>> >>>>>> # Elasticsearch work directory >>>>>> #WORK_DIR=/tmp/elasticsearch >>>>>> >>>>>> # Elasticsearch configuration directory >>>>>> #CONF_DIR=/etc/elasticsearch >>>>>> >>>>>> # Elasticsearch configuration file (elasticsearch.yml) >>>>>> #CONF_FILE=/etc/elasticsearch/elasticsearch.yml >>>>>> >>>>>> # Additional Java OPTS >>>>>> #ES_JAVA_OPTS= >>>>>> >>>>>> # Configure restart on package upgrade (true, every other setting >>>>>> will lead to not restarting) >>>>>> #RESTART_ON_UPGRADE=true >>>>>> >>>>>> I also see the same setting in /etc/init.d/elasticsearch. Do you know >>>>>> which file takes priority? And what a good size would be? >>>>>> >>>>>> On Tuesday, September 9, 2014 11:32:19 AM UTC-4, vineeth mohan wrote: >>>>>>> >>>>>>> Hello Joshua , >>>>>>> >>>>>>> I am not sure which variable you are referring to on the memory >>>>>>> settings in the config file , please paste the comment and config. >>>>>>> I usually change the config from init.d script. >>>>>>> >>>>>>> Best approach would be to bulk index say 10,000 feeds in sync mode , >>>>>>> wait until is everything is indexed and then proceed to the next batch. >>>>>>> I am not sure about the java API , but long back i used to curl to >>>>>>> this stats API and see how much request was rejected. >>>>>>> >>>>>>> Thanks >>>>>>> Vineeth >>>>>>> >>>>>>> On Tue, Sep 9, 2014 at 8:58 PM, Joshua P <jpeter...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> You also said you wouldn't recommend indexing that much information >>>>>>>> at once. How would you suggest breaking it up and what status should I >>>>>>>> look >>>>>>>> for before doing another batch? I have to come up with some process >>>>>>>> that is >>>>>>>> repeatable and mostly automated. >>>>>>>> >>>>>>>> On Tuesday, September 9, 2014 11:12:59 AM UTC-4, Joshua P wrote: >>>>>>>>> >>>>>>>>> Thanks for the reply, Vineeth! >>>>>>>>> >>>>>>>>> What's a practical heap size? I've seen some people saying they >>>>>>>>> set it to 30gb but this confuses me because in the >>>>>>>>> /etc/default/elasticsearch file, the comment suggests the max is only >>>>>>>>> 1gb? >>>>>>>>> >>>>>>>>> I'll look into the threadpool issue. Is there a Java API for >>>>>>>>> monitoring Cluster Node health? Can you point me at an example or >>>>>>>>> give me a >>>>>>>>> link to that? >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> On Tuesday, September 9, 2014 10:52:35 AM UTC-4, vineeth mohan >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hello Joshuva , >>>>>>>>>> >>>>>>>>>> I have a feeling this has something to do with the threadpool. >>>>>>>>>> There is a limit on number of feeds to be queued for indexing. >>>>>>>>>> >>>>>>>>>> Try increasing the size of threadpool queue of index and bulk to >>>>>>>>>> a large number. >>>>>>>>>> Also through cluster node API on threadpool, you can see if any >>>>>>>>>> request has failed. >>>>>>>>>> Monitor this API for any failed request due to large volume. >>>>>>>>>> >>>>>>>>>> Threadpool - http://www.elasticsearch.org/guide/en/elasticsearch/ >>>>>>>>>> reference/current/modules-threadpool.html >>>>>>>>>> Threadpool stats - http://www.elasticsearch.org >>>>>>>>>> /guide/en/elasticsearch/reference/current/cluster-nodes-stat >>>>>>>>>> s.html >>>>>>>>>> >>>>>>>>>> Having said that , i wont recommend bulk indexing that much >>>>>>>>>> information at a time and 512 MB is not going to help much. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Vineeth >>>>>>>>>> >>>>>>>>>> On Tue, Sep 9, 2014 at 7:48 PM, Joshua P <jpeter...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi there! >>>>>>>>>>> >>>>>>>>>>> I'm trying to do a one-time index of about 800,000 records into >>>>>>>>>>> an instance of elasticsearch. But I'm having a bit of trouble. It >>>>>>>>>>> continually fails around 200,000 records. Looking at in the >>>>>>>>>>> Elasticsearch >>>>>>>>>>> Head Plugin, my index goes offline and becomes unrecoverable. >>>>>>>>>>> >>>>>>>>>>> For now, I have it running on a VM on my personal machine. >>>>>>>>>>> >>>>>>>>>>> VM Config: >>>>>>>>>>> Ubuntu Server 14.04 64-Bit >>>>>>>>>>> 8 GB RAM >>>>>>>>>>> 2 Processors >>>>>>>>>>> 32 GB SSD >>>>>>>>>>> >>>>>>>>>>> Java >>>>>>>>>>> java version "1.7.0_65" >>>>>>>>>>> OpenJDK Runtime Environment (IcedTea 2.5.1) >>>>>>>>>>> (7u65-2.5.1-4ubuntu1~0.14.04.2) >>>>>>>>>>> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode) >>>>>>>>>>> >>>>>>>>>>> Elasticsearch is using mostly the defaults. This is the output >>>>>>>>>>> of: >>>>>>>>>>> curl http://localhost:9200/_nodes/process?pretty >>>>>>>>>>> { >>>>>>>>>>> "cluster_name" : "property_transaction_data", >>>>>>>>>>> "nodes" : { >>>>>>>>>>> "KlFkO_qgSOKmV_jjj5xeVw" : { >>>>>>>>>>> "name" : "Marvin Flumm", >>>>>>>>>>> "transport_address" : "inet[/192.168.133.131:9300]", >>>>>>>>>>> "host" : "ubuntu-es", >>>>>>>>>>> "ip" : "127.0.1.1", >>>>>>>>>>> "version" : "1.3.2", >>>>>>>>>>> "build" : "dee175d", >>>>>>>>>>> "http_address" : "inet[/192.168.133.131:9200]", >>>>>>>>>>> "process" : { >>>>>>>>>>> "refresh_interval_in_millis" : 1000, >>>>>>>>>>> "id" : 1092, >>>>>>>>>>> "max_file_descriptors" : 65535, >>>>>>>>>>> "mlockall" : true >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> I adjusted ES_HEAP_SIZE to 512mb. >>>>>>>>>>> >>>>>>>>>>> I'm using the following code to pull data from SQL Server and >>>>>>>>>>> index it. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>> Google Groups "elasticsearch" group. >>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>> it, send an email to elasticsearc...@googlegroups.com. >>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3 >>>>>>>>>>> f-462f-bdcf-df717cbc6269%40googlegroups.com >>>>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>> . >>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "elasticsearch" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/elasticsearch/0dcac495-a07 >>>>>>>> 1-4644-9349-109071fb1855%40googlegroups.com >>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elasticsearch" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>>> msgid/elasticsearch/b439af3d-69b0-4301-bf07-22b37767a17c%40goo >>>>>> glegroups.com >>>>>> <https://groups.google.com/d/msgid/elasticsearch/b439af3d-69b0-4301-bf07-22b37767a17c%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/1765489f-d2f5-47c5-a499-9633c9be54e2% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/1765489f-d2f5-47c5-a499-9633c9be54e2%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/7618bd05-9a0f-4248-8f16-0950198473db%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/7618bd05-9a0f-4248-8f16-0950198473db%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEmuc98erxCGxxn_E8JDzsxvSzu-%3D_w6qLL8RyPeves9w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.