Re: Resolve a DataImportHandler datasource based on previous entity
On Wed, Jan 12, 2011 at 8:49 PM, alexei wrote: [...] > Unfortunately reorganizing the data is not an option for me. > Multiple databases exist and a third party is taking care of > populating them. Once a database reaches a certain size, a switch > occurs and a new database is created with the same table structure. OK, I understand. > Gora Mohanty-3 wrote: >> >> I meant a script that runs the query that defines the datasources for all >> fields, writes a Solr DIH configuration file, and then initiates a >> dataimport. >> > Ok, so the query would select only the articles for which the data is > sitting in a specific datasource. Then, only that one datasource would be > indexed. > For each additional datasource would the script initiate another full-import > with the clean attribute set to false? I do not think that I am completely understanding your use case. Would it be possible for you to describe it in detail? Here is my current view of it: * From some SELECT statement, it is possible for you to tell which datasource what field should come from in the next import. * If so, before the start of a data import, a script can run that same SELECT statement, and figure out what belongs where. * In that case, the script can do the following: - Write a DIH configuration file from its knowledge of where the fields in the next import are coming from. - Do a reload-config to get the new DIH configuration. - Initiate a data import * It is not clear to me how a delta import, and similar things fit into this scenario. I.e., are you also going to be dealing with updates of documents that already exist in the Solr index? However, we can cross that bridge when we come to it. > I tried to make some changes to DIH that comes with Solr 1.4.1 > The getResolvedEntityAttribute("dataSource"); method seems to so the trick. > Here is the modified code. It feels awkward but it seems to work. [...] > I hope I am not breaking any other functionality... > Would it be possible to add something like this to a future release? I am sorry. As things stand, while I do want to be able to get the time to become a contributor to Solr code, it is beyond my current understanding of it to be able to comment on the above. I think that you have the right idea, but am unable to say for sure. Maybe someone more well-versed in Solr can chip in. I would definitely recommend that you open a JIRA ticket, and attach this patch. That way, at least it remains on record. Please include a description of your use case in the ticket. Regards, Gpra
Re: Resolve a DataImportHandler datasource based on previous entity
Hi Gora, Unfortunately reorganizing the data is not an option for me. Multiple databases exist and a third party is taking care of populating them. Once a database reaches a certain size, a switch occurs and a new database is created with the same table structure. Gora Mohanty-3 wrote: > > I meant a script that runs the query that defines the datasources for all > fields, writes a Solr DIH configuration file, and then initiates a > dataimport. > Ok, so the query would select only the articles for which the data is sitting in a specific datasource. Then, only that one datasource would be indexed. For each additional datasource would the script initiate another full-import with the clean attribute set to false? I tried to make some changes to DIH that comes with Solr 1.4.1 The getResolvedEntityAttribute("dataSource"); method seems to so the trick. Here is the modified code. It feels awkward but it seems to work. org.apache.solr.handler.dataimport.ContextImpl public DataSource getDataSource() { if (ds != null) return ds; if(entity == null) return null; String dataSourceResolved = this.getResolvedEntityAttribute("dataSource"); if (entity.dataSrc == null) { entity.dataSrc = dataImporter.getDataSourceInstance(entity, dataSourceResolved, this); entity.dataSource = dataSourceResolved; } else if (!dataSourceResolved.equals(entity.dataSource)) { entity.dataSrc.close(); entity.dataSrc = dataImporter.getDataSourceInstance(entity, dataSourceResolved, this); entity.dataSource = dataSourceResolved; } if (entity.dataSrc != null && docBuilder != null && docBuilder.verboseDebug && Context.FULL_DUMP.equals(currentProcess())) { //debug is not yet implemented properly for deltas entity.dataSrc = docBuilder.writer.getDebugLogger().wrapDs(entity.dataSrc); } return entity.dataSrc; } I hope I am not breaking any other functionality... Would it be possible to add something like this to a future release? Regards, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Resolve-a-DataImportHandler-datasource-based-on-previous-entity-tp2235573p2241653.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Resolve a DataImportHandler datasource based on previous entity
On Wed, Jan 12, 2011 at 1:40 AM, alexei wrote: [...] > The datasource number is stored in the database. > The parent entity queries for this number and in theory it > should becomes available to the child entity - "Article" in my case. I do not think that it is possible to have the datasource name come from a variable. > I am initiating the import via solr/db/dataimport?command=full-import > > Script is a good idea, but I will have close to 200+ datasources and I would > have to generate a map of all the Article ids each time I do a full import > or update. > Did you mean a script that would import all the articles from each > Datasource and then reload > the config solr/db/dataimport?command=reload-config ? I meant a script that runs the query that defines the datasources for all fields, writes a Solr DIH configuration file, and then initiates a dataimport. > In my mind this should be following the same mechanism which resolves > variables in queries. [...] It ought to be possible to allow this syntax. I think that people have not had a need for this. Another possibility might be to revisit how your data are organized. Could you explain why you need to use multiple datasources (in this context, presumably this means multiple databases?), rather than multiple tables? Regards, Gora
Re: Resolve a DataImportHandler datasource based on previous entity
Hi Gora, Thank you for your reply. The datasource number is stored in the database. The parent entity queries for this number and in theory it should becomes available to the child entity - "Article" in my case. I am initiating the import via solr/db/dataimport?command=full-import Script is a good idea, but I will have close to 200+ datasources and I would have to generate a map of all the Article ids each time I do a full import or update. Did you mean a script that would import all the articles from each Datasource and then reload the config solr/db/dataimport?command=reload-config ? In my mind this should be following the same mechanism which resolves variables in queries. Any other ideas? Regards, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Resolve-a-DataImportHandler-datasource-based-on-previous-entity-tp2235573p2236472.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Resolve a DataImportHandler datasource based on previous entity
On Tue, Jan 11, 2011 at 11:10 PM, alexei wrote: > > Hi, > > I am in a situation where the data needed for one of the fields in my > document > may be sitting in a different datasource each time. [...] At what point of time will you be aware of which datasource the field is coming from? How are you initiating the import? One possibility might be to start the import from a script, which first rewrites the data import configuration file according to the datasource that the field is expected to come from. Regards, Gora
Resolve a DataImportHandler datasource based on previous entity
Hi, I am in a situation where the data needed for one of the fields in my document may be sitting in a different datasource each time. I would like to be able to configure something like this: http://lucene.472066.n3.nabble.com/Resolve-a-DataImportHandler-datasource-based-on-previous-entity-tp2235573p2235573.html Sent from the Solr - User mailing list archive at Nabble.com.