[
https://issues.apache.org/jira/browse/SOLR-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927990#action_12927990
]
Greg Fodor edited comment on SOLR-2202 at 11/3/10 4:48 PM:
-----------------------------------------------------------
This update to the patch includes a number of performance enhancements and is
the version of the patch we will be likely to push to production.
First, this patch introduces the defaultCurrency parameter, which defaults to
USD. The default currency allows you to omit the currency code in the field
value (ie, "5000" instead of "5000,USD".) However, it plays a more pivotal role
in improving performance.
The previous patches provided a naive approach to constructing the trie
bounding range by taking the current max and min currency exchange rates to the
target currency. This proved to be minimally useful since the relative value of
currency units vary wildly and hence the bounding range often spanned the full
document set.
The solution I took in this patch is to compute the bounding range by taking
into account the "currency drift." Before getting to that, though, the indexing
process was updated to include a new dynamic field that indexes the value of
the field in the default currency, exchanged at the current rate at indexing
time. (Additionally, a stored field is optionally created if the money field is
marked as stored.)
The historical max and min exchange rates (the "drift") are now tracked by solr
in a properties file. The properties file is named after the currency config
file. For example, if the config file is "currency.xml", the properties file is
"currency.xml.drift.properties". This file is designed to work correctly with
replication, and is updated by Solr whenever the currency config file is loaded.
To compute an accurate bounding range, it is necessary to compute the max and
min "historical composite exchange rates". The "historical" refers to the fact
that the historical max/min exchange rates are used instead of the current
exchange rate. The "composite" refers to the fact that the max/min exchange
rate is computed by taking the max/min of a composition of the max/min exchange
rates between the source currency S, the target currency T, and all
intermediate currencies X. For example, to compute the max historical composite
exchange rate between USD and EUR, take the max value of the the value x*y,
where x is the max historical exchange rate between USD->Z, and y is the max
historical exchange rate between Z->EUR, for all currencies Z.
I made an attempt at proving mathematically that this historical composite
exchange rate approach computes a minimal upper bound and maximal lower bound
for the trie query. If necessary I can attach this proof.
Beyond this, I added some additional intra-query caching and changed the query
construction from the FilteredQuery approach (which seemed to be inefficient in
leveraging the trie query) to the BooleanQuery. You'll note that I rely upon
the second clause in the BooleanQuery being scored first, which eliminates the
expensive exchange rate conversions from happening for documents that fall
outside the trie range.
I ran into a limitation of the current resource loader API, however, in that it
does not allow access to creating or writing new resources, which is needed to
maintain the drift properties file. For now, I only support SolrResourceLoader
which writes to the local filesystem by extracting the config directory.
However, the new ZkResourceLoader is not supported, for example. A non-fatal
warning is emitted to the log when this occurs. The side effect of this is that
currency exchange rate drift will not be tracked, resulting in incorrect range
and point queries if the currency.xml file is updated. It would be nice if it
were possible to ask the ResourceLoader for an OutputStream to a new resource
for this purpose.
Some limitations:
* The default currency cannot be changed after the initial index, otherwise
the index effectively is corrupt since the value for the trie bound is indexed
in the default currency.
* Loss or corruption of the drift file will cause erroneous range and point
queries (documents will be omitted from the results, though no incorrect
documents will appear.)
* As mentioned above, the only ResourceLoader supported are
SolrResourceLoaders that respond to getConfigDir(). Please let me know if there
is a safer, more canonical way to store and load Solr-maintained metadata that
lives with the index.
Also note that this has been tested with replication. The only thing necessary
for replication to work is that the currency.xml and
currency.xml.drift.properties file be included as part of the replication. A
limitation here is that if no documents are updated but the currency exchange
rates change, the file will not be replicated due to Solr's policy of not
replicating files without index changes. It would be useful to allow this
behavior to be overridden. In our case this isn't a problem since our index
churn is high enough that replication events happen regularly.
In the end these changes result in accurate currency range queries that perform
nearly as fast as their non-currency counterparts.
was (Author: gfodor):
This update to the patch includes a number of performance enhancements and
is the version of the patch we will be likely to push to production.
First, this patch introduces the defaultCurrency parameter, which defaults to
USD. The default currency allows you to omit the currency code in the field
value (ie, "5000" instead of "5000,USD".) However, it plays a more pivotal role
in improving performance.
The previous patches provided a native approach to constructing the trie
bounding range by taking the current max and min currency exchange rates to the
target currency. This proved to be minimally useful since the relative value of
currency units vary wildly and hence the bounding range often spanned the full
document set.
The solution I took in this patch is to compute the bounding range by taking
into account the "currency drift." Before getting to that, though, the indexing
process was updated to include a new dynamic field that indexes the value of
the field in the default currency, exchanged at the current rate at indexing
time. (Additionally, a stored field is optionally created if the money field is
marked as stored.)
The historical max and min exchange rates (the "drift") are now tracked by solr
in a properties file. The properties file is named after the currency config
file. For example, if the config file is "currency.xml", the properties file is
"currency.xml.drift.properties". This file is designed to work correctly with
replication, and is updated by Solr whenever the currency config file is loaded.
To compute an accurate bounding range, it is necessary to compute the max and
min "historical composite exchange rates". The "historical" refers to the fact
that the historical max/min exchange rates are used instead of the current
exchange rate. The "composite" refers to the fact that the max/min exchange
rate is computed by taking the max/min of a composition of the max/min exchange
rates between the source currency S, the target currency T, and all
intermediate currencies X. For example, to compute the max historical composite
exchange rate between USD and EUR, take the max value of the the value x*y,
where x is the max historical exchange rate between USD->Z, and y is the max
historical exchange rate between Z->EUR, for all currencies Z.
I made an attempt at proving mathematically that this historical composite
exchange rate approach computes a minimal upper bound and maximal lower bound
for the trie query. If necessary I can attach this proof.
Beyond this, I added some additional intra-query caching and changed the query
construction from the FilteredQuery approach (which seemed to be inefficient in
leveraging the trie query) to the BooleanQuery. You'll note that I rely upon
the second clause in the BooleanQuery being scored first, which eliminates the
expensive exchange rate conversions from happening for documents that fall
outside the trie range.
I ran into a limitation of the current resource loader API, however, in that it
does not allow access to creating or writing new resources, which is needed to
maintain the drift properties file. For now, I only support SolrResourceLoader
which writes to the local filesystem by extracting the config directory.
However, the new ZkResourceLoader is not supported, for example. A non-fatal
warning is emitted to the log when this occurs. The side effect of this is that
currency exchange rate drift will not be tracked, resulting in incorrect range
and point queries if the currency.xml file is updated. It would be nice if it
were possible to ask the ResourceLoader for an OutputStream to a new resource
for this purpose.
Some limitations:
* The default currency cannot be changed after the initial index, otherwise
the index effectively is corrupt since the value for the trie bound is indexed
in the default currency.
* Loss or corruption of the drift file will cause erroneous range and point
queries (documents will be omitted from the results, though no incorrect
documents will appear.)
* As mentioned above, the only ResourceLoader supported are
SolrResourceLoaders that respond to getConfigDir(). Please let me know if there
is a safer, more canonical way to store and load Solr-maintained metadata that
lives with the index.
Also note that this has been tested with replication. The only thing necessary
for replication to work is that the currency.xml and
currency.xml.drift.properties file be included as part of the replication. A
limitation here is that if no documents are updated but the currency exchange
rates change, the file will not be replicated due to Solr's policy of not
replicating files without index changes. It would be useful to allow this
behavior to be overridden. In our case this isn't a problem since our index
churn is high enough that replication events happen regularly.
In the end these changes result in accurate currency range queries that perform
nearly as fast as their non-currency counterparts.
> Money FieldType
> ---------------
>
> Key: SOLR-2202
> URL: https://issues.apache.org/jira/browse/SOLR-2202
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.5
> Reporter: Greg Fodor
> Attachments: SOLR-2022-solr-3.patch, SOLR-2202-lucene-1.patch,
> SOLR-2202-solr-1.patch, SOLR-2202-solr-2.patch, SOLR-2202-solr-4.patch,
> SOLR-2202-solr-5.patch, SOLR-2202-solr-6.patch, SOLR-2202-solr-7.patch
>
>
> Attached please find patches to add support for monetary values to
> Solr/Lucene with query-time currency conversion. The following features are
> supported:
> - Point queries (ex: "price:4.00USD")
> - Range quries (ex: "price:[$5.00 TO $10.00]")
> - Sorting.
> - Currency parsing by either currency code or symbol.
> - Symmetric & Asymmetric exchange rates. (Asymmetric exchange rates are
> useful if there are fees associated with exchanging the currency.)
> At indexing time, money fields can be indexed in a native currency. For
> example, if a product on an e-commerce site is listed in Euros, indexing the
> price field as "10.00EUR" will index it appropriately. By altering the
> currency.xml file, the sorting and querying against Solr can take into
> account fluctuations in currency exchange rates without having to re-index
> the documents.
> The new "money" field type is a polyfield which indexes two fields, one which
> contains the amount of the value and another which contains the currency code
> or symbol. The currency metadata (names, symbols, codes, and exchange rates)
> are expected to be in an xml file which is pointed to by the field type
> declaration in the schema.xml.
> The current patch is factored such that Money utility functions and
> configuration metadata lie in Lucene (see MoneyUtil and CurrencyConfig),
> while the MoneyType and MoneyValueSource lie in Solr. This was meant to
> mirror the work being done on the spacial field types.
> This patch has not yet been deployed to production but will be getting used
> to power the international search capabilities of the search engine at Etsy.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]