I'm trying to index my movie DB into ES using MySQL JDBC river. The problem is: there are 3 tables: movies - has many columns persons - names of the people who participated in some movie tags - movie tags
I'm indexing it using such query (it's not exact query, just pseudo-code to explain the problem): SELECT movies.*, persons.name, tags.value FROM movies m JOIN persons JOIN tags There are quite many movies, each has many columns and each of movies has something like 10-30 persons and 1000 tags as well. Thus because of the joins all the movies data is duplicated 10.000-30.000 times in the resulting set. That leads to a great overload, one indexation takes more than hour, but I need to re-index the data each day. Is there a way to index arrays without duplicating all the data? I tried it in that way - split the query into 3 ones: SELECT id as _id, * FROM movies SELECT movie_id as _id, name FROM persons SELECT movie_id as _id, value FROM tags But these queries overwrite each other instead of updating. Can anybody help me? Looks like the plugin wasn't designed for such cases and I need to write my own strategy (however I don't write in Java :() -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/MySQL-JDBC-river-indexing-large-arrays-tp4069168.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1421404458605-4069168.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
