RE: RE: [JUG-Indonesia] menggunakan StringTokenizer instead of split

Adelwin, Adelwin Wed, 19 May 2010 01:55:09 -0700

Cuma sedikit hubungan nya sama code di bawah... but...

Let's say... gue tulis code begini neh...


String a = "Adelwin" + "Handoyo" + "programmer" + "java";

Lelet ato kagak? I mean relative to kalo make string buffer...

Leletan mana?

Jawab nya... sama lelet nya... ato sama cepet nya...

Karna begitu jadi .class... ntar jdk nya bisa optimize lagi... diganti
jadi make string buffer ntar...

Maksudnya apa?

Maksud nya... optimization nya jdk ituh jauh sekali ... dan mayan
canggih...

G pernah nemu case nya... gue split string pake String.split()... waktu
di compile... die ganti jadi string tokenizer...

I'm not saying that I /will/ happen on all cases...

 

 

Adelwin Handoyo  |  Senior Consultant - Wholesale Bank
Standard Chartered Bank
7, Changi Business Park Cresent, Level 3. Singapore (486028)

T : (65) 659 61395  | E adelwin.adel...@sc.com 

 

 

________________________________

From: jug-indonesia@yahoogroups.com
[mailto:jug-indone...@yahoogroups.com] On Behalf Of Andrian Kurniady
Sent: Wednesday, May 19, 2010 3:42 PM
To: jug-indonesia@yahoogroups.com
Subject: Re: RE: [JUG-Indonesia] menggunakan StringTokenizer instead of
split

 

  

Kena java heap space soalnya semua tokennya disimpen ke dalem 

List<String[]> strings = new ArrayList<String[]>();

--> ini isinya nanti sama besar sama filenya (minus delimiter) +
overhead String object.

 

Kayaknya bedanya tuh tokenizer generate stringnya on the fly. Tiap
request satu token dia kluarin satu string. Jadi gak semuanya dimasukin
ke dalam array. Kalau buat proses file yang panjang, jadi perlu
memorinya gak segede filenya, karena tiap token bisa diproses serial
trus dibuang kalo udah gak dipake.

 

Gedein aja heap sizenya hehe... 

 

-Kurniady

 

2010/5/19 ifnu <ifnub...@gmail.com <mailto:ifnub...@gmail.com> >

  

Akhirnya coba-coba riset juga menggunakan teknik microbenchmarking yang
tidak valid. Testnya membaca file csv dan me load ke dalam Struktur data
matriks 2 dimensi. 

ini kodenya 

public class StringTest {
public static void main(String[] args) throws IOException,
FileNotFoundException {
BufferedWriter writer = new BufferedWriter(new
FileWriter("c:/data.dat"));
for(int i=0; i<1000;i++){
for(int x=0; x<1000;x++){
writer.write("x,");
}
writer.write("\n");
if(i % 100 == 0){
writer.flush();
}
}
writer.flush();
writer.close();
split();
tokenizer();
}
public static void split() throws FileNotFoundException, IOException {
long start = System.currentTimeMillis();
BufferedReader reader = new BufferedReader(new
FileReader("c:/data.dat"));
String line = null;
List<String[]> strings = new ArrayList<String[]>();
while ((line = reader.readLine()) != null) {
String[] split = line.split(",");
strings.add(split);
}

long end = System.currentTimeMillis();
System.out.println("elapsed time " + (end - start));
}
public static void tokenizer() throws FileNotFoundException, IOException
{
long start = System.currentTimeMillis();
BufferedReader reader = new BufferedReader(new
FileReader("c:/data.dat"));
String line = null;
List<List<String>> strings = new ArrayList<List<String>>();
while ((line = reader.readLine()) != null) {
StringTokenizer tokenizer = new StringTokenizer(line);
List<String> tokens = new ArrayList<String>();
while(tokenizer.hasMoreTokens()){
tokens.add(tokenizer.nextToken());
}
strings.add(tokens);
}

long end = System.currentTimeMillis();
System.out.println("elapsed time " + (end - start));
}
}

Hasilnya :

Split elapsed time 1375
tokenizer elapsed time 187

Tokenizer nyaris 10x lebih kenceng kalau datanya diperbesar, ok ok ada
yang bilang fungsi split dipanggil dulu baru tokenizer, gimana kalau
dibalik?

tokenizer elapsed time 125
Split elapsed time 1500

Sepertinya argumen saya masih lumayan valid berdasarkan
microbenchmarking ini. 

Nah pertanyaan lebih lanjut, kalau data file CSV-nya sangat besar,
misalnya 1gb, 2 fungsi ini sama-sama kena java heap space, tapi lebih
karena saya pake bufferedreader, ada yang tertantang memperbaiki kodenya
agar 2 buah cara ini bisa ditest terhadap data yang sangat besar?
onyone? ;) 

______________________________________ 
Sent from my www.pageflakes.com <http://www.pageflakes.com>  startpage

 




This email and any attachments are confidential and may also be privileged.  If 
you are not the addressee, do not disclose, copy, circulate or in any other way 
use or rely on the information contained in this email or any attachments.  If 
received in error, notify the sender immediately and delete this email and any 
attachments from your system.  Emails cannot be guaranteed to be secure or 
error free as the message and any attachments could be intercepted, corrupted, 
lost, delayed, incomplete or amended.  Standard Chartered PLC and its 
subsidiaries do not accept liability for damage caused by this email or any 
attachments and may monitor email traffic.

 

Standard Chartered PLC is incorporated in England with limited liability under 
company number 966425 and has its registered office at 1 Aldermanbury Square, 
London, EC2V 7SB.

 

Standard Chartered Bank ("SCB") is incorporated in England with limited 
liability by Royal Charter 1853, under reference ZC18.  The Principal Office of 
SCB is situated in England at 1 Aldermanbury Square, London EC2V 7SB. In the 
United Kingdom, SCB is authorised and regulated by the Financial Services 
Authority under FSA register number 114276.

 

If you are receiving this email from SCB outside the UK, please click 
http://www.standardchartered.com/global/email_disclaimer.html to refer to the 
information on other jurisdictions.

RE: RE: [JUG-Indonesia] menggunakan StringTokenizer instead of split

Kirim email ke