I'm trying to index my movie DB into ES using MySQL JDBC river.

The problem is:
there are 3 tables:
movies - has many columns
persons - names of the people who participated in some movie
tags - movie tags

I'm indexing it using such query (it's not exact query, just pseudo-code to
explain the problem):

SELECT movies.*, persons.name, tags.value
FROM movies m
JOIN persons
JOIN tags

There are quite many movies, each has many columns and each of movies has
something like 10-30 persons and 1000 tags as well.
Thus because of the joins all the movies data is duplicated 10.000-30.000
times in the resulting set.
That leads to a great overload, one indexation takes more than hour, but I
need to re-index the data each day.

Is there a way to index arrays without duplicating all the data?
I tried it in that way - split the query into 3 ones:

SELECT id as _id, * FROM movies
SELECT movie_id as _id, name FROM persons
SELECT movie_id as _id, value FROM tags

But these queries overwrite each other instead of updating.

Can anybody help me? Looks like the plugin wasn't designed for such cases
and I need to write my own strategy (however I don't write in Java :()




--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/MySQL-JDBC-river-indexing-large-arrays-tp4069168.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1421404458605-4069168.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to