On Mon, Oct 6, 2014 at 7:41 PM, Tom Hood <[email protected]> wrote:
> Hi,
>
> Thanks for the response.  Agreed.  I have no intention of doing this on a
> large index at all.
>
> We're upgrading from cdh4.2.0, java 1.6.x, blur 0.1.4 to cdh5.1, java
> 1.7.x, blur 0.2.3
>
> I'm just trying to get a warm fuzzy that a small blur index created before
> the upgrade is the same after the upgrade.  I was just planning to dump all
> the records and the terms and diff.  Then I might do some sampled (or heck
> it's small, maybe all of them) lookups of records to see that they are
> returned in both.  I realize this is less than an efficient way to go about
> this for a large index and perhaps I'm fooling myself into thinking this is
> a good test at all.  It's a little better than nothing I guess.

Ah ok, makes sense now.  I've typically just relied on an integration
test suite of the application, which inherently exercises the features
of Blur.

Recently, I changed entire indexing approaches and wanted more
confidence so I opened up the directories locally and did term by term
comparison across the two indexes using Lucene classes directly.  But
that was more about validating the change in my own indexing logic
rather than Blur.

> Did the binary format of the files in hdfs change between the blur
> versions?  Can I just do a compare of the various shard files?  Would you
> expect such a comparison to work?  Or does it embed a blur version string
> in there somewhere or other stuff like hostnames, etc, that might trip up a
> diff/compare?

Nah, a binary comparison isn't gonna work at all.

Thanks,
--tim

Reply via email to