Re: Resolve a DataImportHandler datasource based on previous entity

2011-01-12 Thread Gora Mohanty
On Wed, Jan 12, 2011 at 8:49 PM, alexei  wrote:
[...]
> Unfortunately reorganizing the data is not an option for me.
> Multiple databases exist and a third party is taking care of
> populating them. Once a database reaches a certain size, a switch
> occurs and a new database is created with the same table structure.

OK, I understand.

> Gora Mohanty-3 wrote:
>>
>> I meant a script that runs the query that defines the datasources for all
>> fields, writes a Solr DIH configuration file, and then initiates a
>> dataimport.
>>
> Ok, so the query would select only the articles for which the data is
> sitting in a specific datasource. Then, only that one datasource would be
> indexed.
> For each additional datasource would the script initiate another full-import
> with the clean attribute set to false?

I do not think that I am completely understanding your use case.
Would it be possible for you to describe it in detail? Here is my
current view of it:
* From some SELECT statement, it is possible for you to tell
  which datasource what field should come from in the next import.
* If so, before the start of a data import, a script can run that same
  SELECT statement, and figure out what belongs where.
* In that case, the script can do the following:
  - Write a DIH configuration file from its knowledge of where the
fields in the next import are coming from.
  - Do a reload-config to get the new DIH configuration.
  - Initiate a data import
* It is not clear to me how a delta import, and similar things fit
  into this scenario. I.e., are you also going to be dealing with
  updates of documents that already exist in the Solr index?
  However, we can cross that bridge when we come to it.

> I tried to make some changes to DIH that comes with Solr 1.4.1
> The getResolvedEntityAttribute("dataSource"); method seems to so the trick.
> Here is the modified code. It feels awkward but it seems to work.
[...]
> I hope I am not breaking any other functionality...
> Would it be possible to add something like this to a future release?

I am sorry. As things stand, while I do want to be able to get the
time to become a contributor to Solr code, it is beyond my current
understanding of it to be able to comment on the above. I think that
you have the right idea, but am unable to say for sure. Maybe someone
more well-versed in Solr can chip in. I would definitely recommend
that you open a JIRA ticket, and attach this patch. That way, at least
it remains on record. Please include a description of your use case
in the ticket.

Regards,
Gpra


Re: Resolve a DataImportHandler datasource based on previous entity

2011-01-12 Thread alexei

Hi Gora,

Unfortunately reorganizing the data is not an option for me.
Multiple databases exist and a third party is taking care of
populating them. Once a database reaches a certain size, a switch
occurs and a new database is created with the same table structure.


Gora Mohanty-3 wrote:
> 
> I meant a script that runs the query that defines the datasources for all
> fields, writes a Solr DIH configuration file, and then initiates a
> dataimport.
> 
Ok, so the query would select only the articles for which the data is 
sitting in a specific datasource. Then, only that one datasource would be
indexed.
For each additional datasource would the script initiate another full-import
with 
the clean attribute set to false?


I tried to make some changes to DIH that comes with Solr 1.4.1
The getResolvedEntityAttribute("dataSource"); method seems to so the trick.
Here is the modified code. It feels awkward but it seems to work.

org.apache.solr.handler.dataimport.ContextImpl

  public DataSource getDataSource() {
if (ds != null) return ds;
if(entity == null) return  null;

String dataSourceResolved =
this.getResolvedEntityAttribute("dataSource");
 
if (entity.dataSrc == null) {  
entity.dataSrc = dataImporter.getDataSourceInstance(entity,
dataSourceResolved, this);
entity.dataSource = dataSourceResolved;
} else if (!dataSourceResolved.equals(entity.dataSource)) { 
entity.dataSrc.close();
entity.dataSrc = dataImporter.getDataSourceInstance(entity,
dataSourceResolved, this);
entity.dataSource = dataSourceResolved;
}
if (entity.dataSrc != null && docBuilder != null &&
docBuilder.verboseDebug &&
 Context.FULL_DUMP.equals(currentProcess())) {
  //debug is not yet implemented properly for deltas
  entity.dataSrc =
docBuilder.writer.getDebugLogger().wrapDs(entity.dataSrc);
}
return entity.dataSrc;
  }

I hope I am not breaking any other functionality... 
Would it be possible to add something like this to a future release?

Regards,
Alex



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Resolve-a-DataImportHandler-datasource-based-on-previous-entity-tp2235573p2241653.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Resolve a DataImportHandler datasource based on previous entity

2011-01-11 Thread Gora Mohanty
On Wed, Jan 12, 2011 at 1:40 AM, alexei  wrote:
[...]
> The datasource number is stored in the database.
> The parent entity queries for this number and in theory it
> should becomes available to the child entity - "Article" in my case.

I do not think that it is possible to have the datasource name
come from a variable.

> I am initiating the import via solr/db/dataimport?command=full-import
>
> Script is a good idea, but I will have close to 200+ datasources and I would
> have to generate a map of all the Article ids each time I do a full import
> or update.
> Did you mean a script that would import all the articles from each
> Datasource and then reload
> the config solr/db/dataimport?command=reload-config ?

I meant a script that runs the query that defines the datasources for all
fields, writes a Solr DIH configuration file, and then initiates a dataimport.

> In my mind this should be following the same mechanism which resolves
> variables in queries.
[...]

It ought to be possible to allow this syntax. I think that people have
not had a need for this.

Another possibility might be to revisit how your data are organized.
Could you explain why you need to use multiple datasources (in this
context, presumably this means multiple databases?), rather than
multiple tables?

Regards,
Gora


Re: Resolve a DataImportHandler datasource based on previous entity

2011-01-11 Thread alexei

Hi Gora,

Thank you for your reply.

The datasource number is stored in the database.
The parent entity queries for this number and in theory it 
should becomes available to the child entity - "Article" in my case.

I am initiating the import via solr/db/dataimport?command=full-import

Script is a good idea, but I will have close to 200+ datasources and I would
have to generate a map of all the Article ids each time I do a full import
or update.
Did you mean a script that would import all the articles from each
Datasource and then reload 
the config solr/db/dataimport?command=reload-config ?

In my mind this should be following the same mechanism which resolves
variables in queries.
Any other ideas?

Regards,
Alex
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Resolve-a-DataImportHandler-datasource-based-on-previous-entity-tp2235573p2236472.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Resolve a DataImportHandler datasource based on previous entity

2011-01-11 Thread Gora Mohanty
On Tue, Jan 11, 2011 at 11:10 PM, alexei  wrote:
>
> Hi,
>
> I am in a situation where the data needed for one of the fields in my
> document
> may be sitting in a different datasource each time.
[...]

At what point of time will you be aware of which datasource
the field is coming from? How are you initiating the import?
One possibility might be to start the import from a script, which
first rewrites the data import configuration file according to the
datasource that the field is expected to come from.

Regards,
Gora


Resolve a DataImportHandler datasource based on previous entity

2011-01-11 Thread alexei

Hi,

I am in a situation where the data needed for one of the fields in my
document
may be sitting in a different datasource each time.

I would like to be able to configure something like this:
http://lucene.472066.n3.nabble.com/Resolve-a-DataImportHandler-datasource-based-on-previous-entity-tp2235573p2235573.html
Sent from the Solr - User mailing list archive at Nabble.com.