Duplicate values from csv are inserted to DB using Apache Camel

Sandy Thu, 24 Mar 2016 22:41:31 -0700

0
down vote
favorite
I have a large chunk of CSV files(Each containing around millions of
records). So I use seda to use the multi-threading feature. I split 50000 in
chunks, process it and get a List of Entity objects, which I want to split
and persist to DB using jpa. Initially I was getting a Out of Heap Memory
Exception. But later I used a high configuration system and Heap issue was
solved.


But right now the issue is, I am getting duplicate records getting inserted
in the DB. say if there are 1000000 records in the csv, around 2000000
records are getting inserted to DB. There is no primary key for the records
in the Csv files. So I have used hibernate to generate a primary key for it.

Below is my code (came-context.xml)

<camelContext xmlns="http://camel.apache.org/schema/spring";>
        <route>
            <from uri="file:C:\Users\PPP\Desktop\input?noop=true" />
            <to uri="seda:StageIt" />
        </route>


        <route>
            <from uri="seda:StageIt?concurrentConsumers=1" />
            <split streaming="true">
                <tokenize token="\n" group="50000"></tokenize>
                <to uri="seda:WriteToFile" />
            </split>
        </route>


        <route>
            <from uri="seda:WriteToFile?concurrentConsumers=8" />

            <setHeader headerName="CamelFileName">
                <simple>${exchangeId}</simple>
            </setHeader>
            <unmarshal ref="bindyDataformat">
                <bindy type="Csv"  classType="target.bindy.RealEstate"  />
            </unmarshal>
            <split>
                <simple>body</simple>
                <to uri="jpa:target.bindy.RealEstate"/>
            </split>
</route>

Please help.




--
View this message in context: 
http://camel.465427.n5.nabble.com/Duplicate-values-from-csv-are-inserted-to-DB-using-Apache-Camel-tp5779683.html
Sent from the Camel Development mailing list archive at Nabble.com.

Duplicate values from csv are inserted to DB using Apache Camel

Reply via email to