Akhirnya coba-coba riset juga menggunakan teknik microbenchmarking yang tidak valid. Testnya membaca file csv dan me load ke dalam Struktur data matriks 2 dimensi.
ini kodenya public class StringTest { public static void main(String[] args) throws IOException, FileNotFoundException { BufferedWriter writer = new BufferedWriter(new FileWriter("c:/data.dat")); for(int i=0; i<1000;i++){ for(int x=0; x<1000;x++){ writer.write("x,"); } writer.write("\n"); if(i % 100 == 0){ writer.flush(); } } writer.flush(); writer.close(); split(); tokenizer(); } public static void split() throws FileNotFoundException, IOException { long start = System.currentTimeMillis(); BufferedReader reader = new BufferedReader(new FileReader("c:/data.dat")); String line = null; List<String[]> strings = new ArrayList<String[]>(); while ((line = reader.readLine()) != null) { String[] split = line.split(","); strings.add(split); } long end = System.currentTimeMillis(); System.out.println("elapsed time " + (end - start)); } public static void tokenizer() throws FileNotFoundException, IOException { long start = System.currentTimeMillis(); BufferedReader reader = new BufferedReader(new FileReader("c:/data.dat")); String line = null; List<List<String>> strings = new ArrayList<List<String>>(); while ((line = reader.readLine()) != null) { StringTokenizer tokenizer = new StringTokenizer(line); List<String> tokens = new ArrayList<String>(); while(tokenizer.hasMoreTokens()){ tokens.add(tokenizer.nextToken()); } strings.add(tokens); } long end = System.currentTimeMillis(); System.out.println("elapsed time " + (end - start)); } } Hasilnya : Split elapsed time 1375 tokenizer elapsed time 187 Tokenizer nyaris 10x lebih kenceng kalau datanya diperbesar, ok ok ada yang bilang fungsi split dipanggil dulu baru tokenizer, gimana kalau dibalik? tokenizer elapsed time 125 Split elapsed time 1500 Sepertinya argumen saya masih lumayan valid berdasarkan microbenchmarking ini. Nah pertanyaan lebih lanjut, kalau data file CSV-nya sangat besar, misalnya 1gb, 2 fungsi ini sama-sama kena java heap space, tapi lebih karena saya pake bufferedreader, ada yang tertantang memperbaiki kodenya agar 2 buah cara ini bisa ditest terhadap data yang sangat besar? onyone? ;) ______________________________________ Sent from my www.pageflakes.com startpage