Re: Techniques for Retrieving Hits
On 5/14/2018 3:13 PM, Terry Steichen wrote: > I posted this note because I've not seen list comments pertaining to the > job of actually locating and retrieving hitlist documents. How documents are retrieved will be highly dependent on your setup. Here's how things usually go: If the original data came from a database, then the system where people do their searches should know how to talk to the database, and use information in the search results to look up the full original document in the database. If the source data is on a file server, then the system where people do their searches will need to have the file server storage mounted. It will then use information in the search results to access the full original document. Ditto for any other kind of canonical data store with Solr as the search engine. The system where searches are done will be implemented by you. It will be up to that system to handle any kind of security filtering for both Solr searches and document access. Solr should not be exposed directly to end users. Most of the time, what's in Solr is not particularly sensitive ... but when Solr is exposed to people who cannot be trusted, those end users may be able to change or delete any data in Solr. They might also be able to send denial of service queries directly to Solr. Thanks, Shawn
Re: Techniques for Retrieving Hits
Shawn, As noted in my embedded comments below, I don't really see the problem you apparently do. Maybe I'm missing something important (which certainly wouldn't be the first - or last - time that happened). I posted this note because I've not seen list comments pertaining to the job of actually locating and retrieving hitlist documents. My way "seems" to work, and it is quite simple and compact. I just threw it out seeking a sanity check from others. Terry On 05/14/2018 11:32 AM, Shawn Heisey wrote: > On 5/14/2018 6:46 AM, Terry Steichen wrote: >> In order to allow users to retrieve the documents that match a query, I >> make use of the embedded Jetty container to provide file server >> functionality. To make this happen, I provide a symbolic link between >> the actual document archive, and the Jetty file server. This seems >> somewhat of a kludge, and I'm wondering if others have a better way to >> retrieve the desired documents? (I'm not too concerned about security >> because I use ssh port forwarding to connect to remote authenticated >> clients.) > > This is not a recommended usage for the servlet container where Solr > runs. But if the retrieval traffic is light, what's the problem? > > Solr is a search engine. It is not designed to be a data store, > although some people do use it that way. Perhaps I didn't explain it right, but I'm not using it as a datastore (other than the fact that I keep the actual file repository on the same machine on which Solr runs. I've got plenty of storage, so that's not an issue, and, as I mentioned above, traffic is quite light. > > If systems running Solr clients want to access all the information for > a document when the search results do not contain all the information, > they should use what IS in the search results to access that data from > the system where it is stored -- that could be a database, a file > server, a webserver, or similar. Perhaps I'm missing something, but search results cannot "contain all the information" can they? I use highlighting but that's just showing a few snippets - not a substitute for the document itself. > > Thanks, > Shawn > >
Re: Techniques for Retrieving Hits
On 5/14/2018 6:46 AM, Terry Steichen wrote: In order to allow users to retrieve the documents that match a query, I make use of the embedded Jetty container to provide file server functionality. To make this happen, I provide a symbolic link between the actual document archive, and the Jetty file server. This seems somewhat of a kludge, and I'm wondering if others have a better way to retrieve the desired documents? (I'm not too concerned about security because I use ssh port forwarding to connect to remote authenticated clients.) This is not a recommended usage for the servlet container where Solr runs. Solr is a search engine. It is not designed to be a data store, although some people do use it that way. If systems running Solr clients want to access all the information for a document when the search results do not contain all the information, they should use what IS in the search results to access that data from the system where it is stored -- that could be a database, a file server, a webserver, or similar. Thanks, Shawn